Vendor integration dependencies.

This commit is contained in:
Timo Reimann 2017-02-07 22:33:23 +01:00
parent dd5e3fba01
commit 55b57c736b
2451 changed files with 731611 additions and 0 deletions

354
integration/vendor/github.com/hashicorp/consul/LICENSE generated vendored Normal file
View file

@ -0,0 +1,354 @@
Mozilla Public License, version 2.0
1. Definitions
1.1. “Contributor”
means each individual or legal entity that creates, contributes to the
creation of, or owns Covered Software.
1.2. “Contributor Version”
means the combination of the Contributions of others (if any) used by a
Contributor and that particular Contributors Contribution.
1.3. “Contribution”
means Covered Software of a particular Contributor.
1.4. “Covered Software”
means Source Code Form to which the initial Contributor has attached the
notice in Exhibit A, the Executable Form of such Source Code Form, and
Modifications of such Source Code Form, in each case including portions
thereof.
1.5. “Incompatible With Secondary Licenses”
means
a. that the initial Contributor has attached the notice described in
Exhibit B to the Covered Software; or
b. that the Covered Software was made available under the terms of version
1.1 or earlier of the License, but not also under the terms of a
Secondary License.
1.6. “Executable Form”
means any form of the work other than Source Code Form.
1.7. “Larger Work”
means a work that combines Covered Software with other material, in a separate
file or files, that is not Covered Software.
1.8. “License”
means this document.
1.9. “Licensable”
means having the right to grant, to the maximum extent possible, whether at the
time of the initial grant or subsequently, any and all of the rights conveyed by
this License.
1.10. “Modifications”
means any of the following:
a. any file in Source Code Form that results from an addition to, deletion
from, or modification of the contents of Covered Software; or
b. any new file in Source Code Form that contains any Covered Software.
1.11. “Patent Claims” of a Contributor
means any patent claim(s), including without limitation, method, process,
and apparatus claims, in any patent Licensable by such Contributor that
would be infringed, but for the grant of the License, by the making,
using, selling, offering for sale, having made, import, or transfer of
either its Contributions or its Contributor Version.
1.12. “Secondary License”
means either the GNU General Public License, Version 2.0, the GNU Lesser
General Public License, Version 2.1, the GNU Affero General Public
License, Version 3.0, or any later versions of those licenses.
1.13. “Source Code Form”
means the form of the work preferred for making modifications.
1.14. “You” (or “Your”)
means an individual or a legal entity exercising rights under this
License. For legal entities, “You” includes any entity that controls, is
controlled by, or is under common control with You. For purposes of this
definition, “control” means (a) the power, direct or indirect, to cause
the direction or management of such entity, whether by contract or
otherwise, or (b) ownership of more than fifty percent (50%) of the
outstanding shares or beneficial ownership of such entity.
2. License Grants and Conditions
2.1. Grants
Each Contributor hereby grants You a world-wide, royalty-free,
non-exclusive license:
a. under intellectual property rights (other than patent or trademark)
Licensable by such Contributor to use, reproduce, make available,
modify, display, perform, distribute, and otherwise exploit its
Contributions, either on an unmodified basis, with Modifications, or as
part of a Larger Work; and
b. under Patent Claims of such Contributor to make, use, sell, offer for
sale, have made, import, and otherwise transfer either its Contributions
or its Contributor Version.
2.2. Effective Date
The licenses granted in Section 2.1 with respect to any Contribution become
effective for each Contribution on the date the Contributor first distributes
such Contribution.
2.3. Limitations on Grant Scope
The licenses granted in this Section 2 are the only rights granted under this
License. No additional rights or licenses will be implied from the distribution
or licensing of Covered Software under this License. Notwithstanding Section
2.1(b) above, no patent license is granted by a Contributor:
a. for any code that a Contributor has removed from Covered Software; or
b. for infringements caused by: (i) Your and any other third partys
modifications of Covered Software, or (ii) the combination of its
Contributions with other software (except as part of its Contributor
Version); or
c. under Patent Claims infringed by Covered Software in the absence of its
Contributions.
This License does not grant any rights in the trademarks, service marks, or
logos of any Contributor (except as may be necessary to comply with the
notice requirements in Section 3.4).
2.4. Subsequent Licenses
No Contributor makes additional grants as a result of Your choice to
distribute the Covered Software under a subsequent version of this License
(see Section 10.2) or under the terms of a Secondary License (if permitted
under the terms of Section 3.3).
2.5. Representation
Each Contributor represents that the Contributor believes its Contributions
are its original creation(s) or it has sufficient rights to grant the
rights to its Contributions conveyed by this License.
2.6. Fair Use
This License is not intended to limit any rights You have under applicable
copyright doctrines of fair use, fair dealing, or other equivalents.
2.7. Conditions
Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in
Section 2.1.
3. Responsibilities
3.1. Distribution of Source Form
All distribution of Covered Software in Source Code Form, including any
Modifications that You create or to which You contribute, must be under the
terms of this License. You must inform recipients that the Source Code Form
of the Covered Software is governed by the terms of this License, and how
they can obtain a copy of this License. You may not attempt to alter or
restrict the recipients rights in the Source Code Form.
3.2. Distribution of Executable Form
If You distribute Covered Software in Executable Form then:
a. such Covered Software must also be made available in Source Code Form,
as described in Section 3.1, and You must inform recipients of the
Executable Form how they can obtain a copy of such Source Code Form by
reasonable means in a timely manner, at a charge no more than the cost
of distribution to the recipient; and
b. You may distribute such Executable Form under the terms of this License,
or sublicense it under different terms, provided that the license for
the Executable Form does not attempt to limit or alter the recipients
rights in the Source Code Form under this License.
3.3. Distribution of a Larger Work
You may create and distribute a Larger Work under terms of Your choice,
provided that You also comply with the requirements of this License for the
Covered Software. If the Larger Work is a combination of Covered Software
with a work governed by one or more Secondary Licenses, and the Covered
Software is not Incompatible With Secondary Licenses, this License permits
You to additionally distribute such Covered Software under the terms of
such Secondary License(s), so that the recipient of the Larger Work may, at
their option, further distribute the Covered Software under the terms of
either this License or such Secondary License(s).
3.4. Notices
You may not remove or alter the substance of any license notices (including
copyright notices, patent notices, disclaimers of warranty, or limitations
of liability) contained within the Source Code Form of the Covered
Software, except that You may alter any license notices to the extent
required to remedy known factual inaccuracies.
3.5. Application of Additional Terms
You may choose to offer, and to charge a fee for, warranty, support,
indemnity or liability obligations to one or more recipients of Covered
Software. However, You may do so only on Your own behalf, and not on behalf
of any Contributor. You must make it absolutely clear that any such
warranty, support, indemnity, or liability obligation is offered by You
alone, and You hereby agree to indemnify every Contributor for any
liability incurred by such Contributor as a result of warranty, support,
indemnity or liability terms You offer. You may include additional
disclaimers of warranty and limitations of liability specific to any
jurisdiction.
4. Inability to Comply Due to Statute or Regulation
If it is impossible for You to comply with any of the terms of this License
with respect to some or all of the Covered Software due to statute, judicial
order, or regulation then You must: (a) comply with the terms of this License
to the maximum extent possible; and (b) describe the limitations and the code
they affect. Such description must be placed in a text file included with all
distributions of the Covered Software under this License. Except to the
extent prohibited by statute or regulation, such description must be
sufficiently detailed for a recipient of ordinary skill to be able to
understand it.
5. Termination
5.1. The rights granted under this License will terminate automatically if You
fail to comply with any of its terms. However, if You become compliant,
then the rights granted under this License from a particular Contributor
are reinstated (a) provisionally, unless and until such Contributor
explicitly and finally terminates Your grants, and (b) on an ongoing basis,
if such Contributor fails to notify You of the non-compliance by some
reasonable means prior to 60 days after You have come back into compliance.
Moreover, Your grants from a particular Contributor are reinstated on an
ongoing basis if such Contributor notifies You of the non-compliance by
some reasonable means, this is the first time You have received notice of
non-compliance with this License from such Contributor, and You become
compliant prior to 30 days after Your receipt of the notice.
5.2. If You initiate litigation against any entity by asserting a patent
infringement claim (excluding declaratory judgment actions, counter-claims,
and cross-claims) alleging that a Contributor Version directly or
indirectly infringes any patent, then the rights granted to You by any and
all Contributors for the Covered Software under Section 2.1 of this License
shall terminate.
5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user
license agreements (excluding distributors and resellers) which have been
validly granted by You or Your distributors under this License prior to
termination shall survive termination.
6. Disclaimer of Warranty
Covered Software is provided under this License on an “as is” basis, without
warranty of any kind, either expressed, implied, or statutory, including,
without limitation, warranties that the Covered Software is free of defects,
merchantable, fit for a particular purpose or non-infringing. The entire
risk as to the quality and performance of the Covered Software is with You.
Should any Covered Software prove defective in any respect, You (not any
Contributor) assume the cost of any necessary servicing, repair, or
correction. This disclaimer of warranty constitutes an essential part of this
License. No use of any Covered Software is authorized under this License
except under this disclaimer.
7. Limitation of Liability
Under no circumstances and under no legal theory, whether tort (including
negligence), contract, or otherwise, shall any Contributor, or anyone who
distributes Covered Software as permitted above, be liable to You for any
direct, indirect, special, incidental, or consequential damages of any
character including, without limitation, damages for lost profits, loss of
goodwill, work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses, even if such party shall have been
informed of the possibility of such damages. This limitation of liability
shall not apply to liability for death or personal injury resulting from such
partys negligence to the extent applicable law prohibits such limitation.
Some jurisdictions do not allow the exclusion or limitation of incidental or
consequential damages, so this exclusion and limitation may not apply to You.
8. Litigation
Any litigation relating to this License may be brought only in the courts of
a jurisdiction where the defendant maintains its principal place of business
and such litigation shall be governed by laws of that jurisdiction, without
reference to its conflict-of-law provisions. Nothing in this Section shall
prevent a partys ability to bring cross-claims or counter-claims.
9. Miscellaneous
This License represents the complete agreement concerning the subject matter
hereof. If any provision of this License is held to be unenforceable, such
provision shall be reformed only to the extent necessary to make it
enforceable. Any law or regulation which provides that the language of a
contract shall be construed against the drafter shall not be used to construe
this License against a Contributor.
10. Versions of the License
10.1. New Versions
Mozilla Foundation is the license steward. Except as provided in Section
10.3, no one other than the license steward has the right to modify or
publish new versions of this License. Each version will be given a
distinguishing version number.
10.2. Effect of New Versions
You may distribute the Covered Software under the terms of the version of
the License under which You originally received the Covered Software, or
under the terms of any subsequent version published by the license
steward.
10.3. Modified Versions
If you create software not governed by this License, and you want to
create a new license for such software, you may create and use a modified
version of this License if you rename the license and remove any
references to the name of the license steward (except to note that such
modified license differs from this License).
10.4. Distributing Source Code Form that is Incompatible With Secondary Licenses
If You choose to distribute Source Code Form that is Incompatible With
Secondary Licenses under the terms of this version of the License, the
notice described in Exhibit B of this License must be attached.
Exhibit A - Source Code Form License Notice
This Source Code Form is subject to the
terms of the Mozilla Public License, v.
2.0. If a copy of the MPL was not
distributed with this file, You can
obtain one at
http://mozilla.org/MPL/2.0/.
If it is not possible or desirable to put the notice in a particular file, then
You may include the notice in a location (such as a LICENSE file in a relevant
directory) where a recipient would be likely to look for such a notice.
You may add additional accurate notices of copyright ownership.
Exhibit B - “Incompatible With Secondary Licenses” Notice
This Source Code Form is “Incompatible
With Secondary Licenses”, as defined by
the Mozilla Public License, v. 2.0.

View file

@ -0,0 +1,140 @@
package api
const (
// ACLCLientType is the client type token
ACLClientType = "client"
// ACLManagementType is the management type token
ACLManagementType = "management"
)
// ACLEntry is used to represent an ACL entry
type ACLEntry struct {
CreateIndex uint64
ModifyIndex uint64
ID string
Name string
Type string
Rules string
}
// ACL can be used to query the ACL endpoints
type ACL struct {
c *Client
}
// ACL returns a handle to the ACL endpoints
func (c *Client) ACL() *ACL {
return &ACL{c}
}
// Create is used to generate a new token with the given parameters
func (a *ACL) Create(acl *ACLEntry, q *WriteOptions) (string, *WriteMeta, error) {
r := a.c.newRequest("PUT", "/v1/acl/create")
r.setWriteOptions(q)
r.obj = acl
rtt, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return "", nil, err
}
defer resp.Body.Close()
wm := &WriteMeta{RequestTime: rtt}
var out struct{ ID string }
if err := decodeBody(resp, &out); err != nil {
return "", nil, err
}
return out.ID, wm, nil
}
// Update is used to update the rules of an existing token
func (a *ACL) Update(acl *ACLEntry, q *WriteOptions) (*WriteMeta, error) {
r := a.c.newRequest("PUT", "/v1/acl/update")
r.setWriteOptions(q)
r.obj = acl
rtt, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
wm := &WriteMeta{RequestTime: rtt}
return wm, nil
}
// Destroy is used to destroy a given ACL token ID
func (a *ACL) Destroy(id string, q *WriteOptions) (*WriteMeta, error) {
r := a.c.newRequest("PUT", "/v1/acl/destroy/"+id)
r.setWriteOptions(q)
rtt, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, err
}
resp.Body.Close()
wm := &WriteMeta{RequestTime: rtt}
return wm, nil
}
// Clone is used to return a new token cloned from an existing one
func (a *ACL) Clone(id string, q *WriteOptions) (string, *WriteMeta, error) {
r := a.c.newRequest("PUT", "/v1/acl/clone/"+id)
r.setWriteOptions(q)
rtt, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return "", nil, err
}
defer resp.Body.Close()
wm := &WriteMeta{RequestTime: rtt}
var out struct{ ID string }
if err := decodeBody(resp, &out); err != nil {
return "", nil, err
}
return out.ID, wm, nil
}
// Info is used to query for information about an ACL token
func (a *ACL) Info(id string, q *QueryOptions) (*ACLEntry, *QueryMeta, error) {
r := a.c.newRequest("GET", "/v1/acl/info/"+id)
r.setQueryOptions(q)
rtt, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var entries []*ACLEntry
if err := decodeBody(resp, &entries); err != nil {
return nil, nil, err
}
if len(entries) > 0 {
return entries[0], qm, nil
}
return nil, qm, nil
}
// List is used to get all the ACL tokens
func (a *ACL) List(q *QueryOptions) ([]*ACLEntry, *QueryMeta, error) {
r := a.c.newRequest("GET", "/v1/acl/list")
r.setQueryOptions(q)
rtt, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var entries []*ACLEntry
if err := decodeBody(resp, &entries); err != nil {
return nil, nil, err
}
return entries, qm, nil
}

View file

@ -0,0 +1,411 @@
package api
import (
"fmt"
)
// AgentCheck represents a check known to the agent
type AgentCheck struct {
Node string
CheckID string
Name string
Status string
Notes string
Output string
ServiceID string
ServiceName string
}
// AgentService represents a service known to the agent
type AgentService struct {
ID string
Service string
Tags []string
Port int
Address string
EnableTagOverride bool
}
// AgentMember represents a cluster member known to the agent
type AgentMember struct {
Name string
Addr string
Port uint16
Tags map[string]string
Status int
ProtocolMin uint8
ProtocolMax uint8
ProtocolCur uint8
DelegateMin uint8
DelegateMax uint8
DelegateCur uint8
}
// AgentServiceRegistration is used to register a new service
type AgentServiceRegistration struct {
ID string `json:",omitempty"`
Name string `json:",omitempty"`
Tags []string `json:",omitempty"`
Port int `json:",omitempty"`
Address string `json:",omitempty"`
EnableTagOverride bool `json:",omitempty"`
Check *AgentServiceCheck
Checks AgentServiceChecks
}
// AgentCheckRegistration is used to register a new check
type AgentCheckRegistration struct {
ID string `json:",omitempty"`
Name string `json:",omitempty"`
Notes string `json:",omitempty"`
ServiceID string `json:",omitempty"`
AgentServiceCheck
}
// AgentServiceCheck is used to define a node or service level check
type AgentServiceCheck struct {
Script string `json:",omitempty"`
DockerContainerID string `json:",omitempty"`
Shell string `json:",omitempty"` // Only supported for Docker.
Interval string `json:",omitempty"`
Timeout string `json:",omitempty"`
TTL string `json:",omitempty"`
HTTP string `json:",omitempty"`
TCP string `json:",omitempty"`
Status string `json:",omitempty"`
// In Consul 0.7 and later, checks that are associated with a service
// may also contain this optional DeregisterCriticalServiceAfter field,
// which is a timeout in the same Go time format as Interval and TTL. If
// a check is in the critical state for more than this configured value,
// then its associated service (and all of its associated checks) will
// automatically be deregistered.
DeregisterCriticalServiceAfter string `json:",omitempty"`
}
type AgentServiceChecks []*AgentServiceCheck
// Agent can be used to query the Agent endpoints
type Agent struct {
c *Client
// cache the node name
nodeName string
}
// Agent returns a handle to the agent endpoints
func (c *Client) Agent() *Agent {
return &Agent{c: c}
}
// Self is used to query the agent we are speaking to for
// information about itself
func (a *Agent) Self() (map[string]map[string]interface{}, error) {
r := a.c.newRequest("GET", "/v1/agent/self")
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var out map[string]map[string]interface{}
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return out, nil
}
// NodeName is used to get the node name of the agent
func (a *Agent) NodeName() (string, error) {
if a.nodeName != "" {
return a.nodeName, nil
}
info, err := a.Self()
if err != nil {
return "", err
}
name := info["Config"]["NodeName"].(string)
a.nodeName = name
return name, nil
}
// Checks returns the locally registered checks
func (a *Agent) Checks() (map[string]*AgentCheck, error) {
r := a.c.newRequest("GET", "/v1/agent/checks")
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var out map[string]*AgentCheck
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return out, nil
}
// Services returns the locally registered services
func (a *Agent) Services() (map[string]*AgentService, error) {
r := a.c.newRequest("GET", "/v1/agent/services")
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var out map[string]*AgentService
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return out, nil
}
// Members returns the known gossip members. The WAN
// flag can be used to query a server for WAN members.
func (a *Agent) Members(wan bool) ([]*AgentMember, error) {
r := a.c.newRequest("GET", "/v1/agent/members")
if wan {
r.params.Set("wan", "1")
}
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var out []*AgentMember
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return out, nil
}
// ServiceRegister is used to register a new service with
// the local agent
func (a *Agent) ServiceRegister(service *AgentServiceRegistration) error {
r := a.c.newRequest("PUT", "/v1/agent/service/register")
r.obj = service
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// ServiceDeregister is used to deregister a service with
// the local agent
func (a *Agent) ServiceDeregister(serviceID string) error {
r := a.c.newRequest("PUT", "/v1/agent/service/deregister/"+serviceID)
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// PassTTL is used to set a TTL check to the passing state.
//
// DEPRECATION NOTICE: This interface is deprecated in favor of UpdateTTL().
// The client interface will be removed in 0.8 or changed to use
// UpdateTTL()'s endpoint and the server endpoints will be removed in 0.9.
func (a *Agent) PassTTL(checkID, note string) error {
return a.updateTTL(checkID, note, "pass")
}
// WarnTTL is used to set a TTL check to the warning state.
//
// DEPRECATION NOTICE: This interface is deprecated in favor of UpdateTTL().
// The client interface will be removed in 0.8 or changed to use
// UpdateTTL()'s endpoint and the server endpoints will be removed in 0.9.
func (a *Agent) WarnTTL(checkID, note string) error {
return a.updateTTL(checkID, note, "warn")
}
// FailTTL is used to set a TTL check to the failing state.
//
// DEPRECATION NOTICE: This interface is deprecated in favor of UpdateTTL().
// The client interface will be removed in 0.8 or changed to use
// UpdateTTL()'s endpoint and the server endpoints will be removed in 0.9.
func (a *Agent) FailTTL(checkID, note string) error {
return a.updateTTL(checkID, note, "fail")
}
// updateTTL is used to update the TTL of a check. This is the internal
// method that uses the old API that's present in Consul versions prior to
// 0.6.4. Since Consul didn't have an analogous "update" API before it seemed
// ok to break this (former) UpdateTTL in favor of the new UpdateTTL below,
// but keep the old Pass/Warn/Fail methods using the old API under the hood.
//
// DEPRECATION NOTICE: This interface is deprecated in favor of UpdateTTL().
// The client interface will be removed in 0.8 and the server endpoints will
// be removed in 0.9.
func (a *Agent) updateTTL(checkID, note, status string) error {
switch status {
case "pass":
case "warn":
case "fail":
default:
return fmt.Errorf("Invalid status: %s", status)
}
endpoint := fmt.Sprintf("/v1/agent/check/%s/%s", status, checkID)
r := a.c.newRequest("PUT", endpoint)
r.params.Set("note", note)
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// checkUpdate is the payload for a PUT for a check update.
type checkUpdate struct {
// Status is one of the api.Health* states: HealthPassing
// ("passing"), HealthWarning ("warning"), or HealthCritical
// ("critical").
Status string
// Output is the information to post to the UI for operators as the
// output of the process that decided to hit the TTL check. This is
// different from the note field that's associated with the check
// itself.
Output string
}
// UpdateTTL is used to update the TTL of a check. This uses the newer API
// that was introduced in Consul 0.6.4 and later. We translate the old status
// strings for compatibility (though a newer version of Consul will still be
// required to use this API).
func (a *Agent) UpdateTTL(checkID, output, status string) error {
switch status {
case "pass", HealthPassing:
status = HealthPassing
case "warn", HealthWarning:
status = HealthWarning
case "fail", HealthCritical:
status = HealthCritical
default:
return fmt.Errorf("Invalid status: %s", status)
}
endpoint := fmt.Sprintf("/v1/agent/check/update/%s", checkID)
r := a.c.newRequest("PUT", endpoint)
r.obj = &checkUpdate{
Status: status,
Output: output,
}
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// CheckRegister is used to register a new check with
// the local agent
func (a *Agent) CheckRegister(check *AgentCheckRegistration) error {
r := a.c.newRequest("PUT", "/v1/agent/check/register")
r.obj = check
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// CheckDeregister is used to deregister a check with
// the local agent
func (a *Agent) CheckDeregister(checkID string) error {
r := a.c.newRequest("PUT", "/v1/agent/check/deregister/"+checkID)
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// Join is used to instruct the agent to attempt a join to
// another cluster member
func (a *Agent) Join(addr string, wan bool) error {
r := a.c.newRequest("PUT", "/v1/agent/join/"+addr)
if wan {
r.params.Set("wan", "1")
}
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// ForceLeave is used to have the agent eject a failed node
func (a *Agent) ForceLeave(node string) error {
r := a.c.newRequest("PUT", "/v1/agent/force-leave/"+node)
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// EnableServiceMaintenance toggles service maintenance mode on
// for the given service ID.
func (a *Agent) EnableServiceMaintenance(serviceID, reason string) error {
r := a.c.newRequest("PUT", "/v1/agent/service/maintenance/"+serviceID)
r.params.Set("enable", "true")
r.params.Set("reason", reason)
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// DisableServiceMaintenance toggles service maintenance mode off
// for the given service ID.
func (a *Agent) DisableServiceMaintenance(serviceID string) error {
r := a.c.newRequest("PUT", "/v1/agent/service/maintenance/"+serviceID)
r.params.Set("enable", "false")
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// EnableNodeMaintenance toggles node maintenance mode on for the
// agent we are connected to.
func (a *Agent) EnableNodeMaintenance(reason string) error {
r := a.c.newRequest("PUT", "/v1/agent/maintenance")
r.params.Set("enable", "true")
r.params.Set("reason", reason)
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}
// DisableNodeMaintenance toggles node maintenance mode off for the
// agent we are connected to.
func (a *Agent) DisableNodeMaintenance() error {
r := a.c.newRequest("PUT", "/v1/agent/maintenance")
r.params.Set("enable", "false")
_, resp, err := requireOK(a.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}

View file

@ -0,0 +1,591 @@
package api
import (
"bytes"
"crypto/tls"
"crypto/x509"
"encoding/json"
"fmt"
"io"
"io/ioutil"
"log"
"net"
"net/http"
"net/url"
"os"
"strconv"
"strings"
"time"
"github.com/hashicorp/go-cleanhttp"
)
// QueryOptions are used to parameterize a query
type QueryOptions struct {
// Providing a datacenter overwrites the DC provided
// by the Config
Datacenter string
// AllowStale allows any Consul server (non-leader) to service
// a read. This allows for lower latency and higher throughput
AllowStale bool
// RequireConsistent forces the read to be fully consistent.
// This is more expensive but prevents ever performing a stale
// read.
RequireConsistent bool
// WaitIndex is used to enable a blocking query. Waits
// until the timeout or the next index is reached
WaitIndex uint64
// WaitTime is used to bound the duration of a wait.
// Defaults to that of the Config, but can be overridden.
WaitTime time.Duration
// Token is used to provide a per-request ACL token
// which overrides the agent's default token.
Token string
// Near is used to provide a node name that will sort the results
// in ascending order based on the estimated round trip time from
// that node. Setting this to "_agent" will use the agent's node
// for the sort.
Near string
}
// WriteOptions are used to parameterize a write
type WriteOptions struct {
// Providing a datacenter overwrites the DC provided
// by the Config
Datacenter string
// Token is used to provide a per-request ACL token
// which overrides the agent's default token.
Token string
}
// QueryMeta is used to return meta data about a query
type QueryMeta struct {
// LastIndex. This can be used as a WaitIndex to perform
// a blocking query
LastIndex uint64
// Time of last contact from the leader for the
// server servicing the request
LastContact time.Duration
// Is there a known leader
KnownLeader bool
// How long did the request take
RequestTime time.Duration
// Is address translation enabled for HTTP responses on this agent
AddressTranslationEnabled bool
}
// WriteMeta is used to return meta data about a write
type WriteMeta struct {
// How long did the request take
RequestTime time.Duration
}
// HttpBasicAuth is used to authenticate http client with HTTP Basic Authentication
type HttpBasicAuth struct {
// Username to use for HTTP Basic Authentication
Username string
// Password to use for HTTP Basic Authentication
Password string
}
// Config is used to configure the creation of a client
type Config struct {
// Address is the address of the Consul server
Address string
// Scheme is the URI scheme for the Consul server
Scheme string
// Datacenter to use. If not provided, the default agent datacenter is used.
Datacenter string
// HttpClient is the client to use. Default will be
// used if not provided.
HttpClient *http.Client
// HttpAuth is the auth info to use for http access.
HttpAuth *HttpBasicAuth
// WaitTime limits how long a Watch will block. If not provided,
// the agent default values will be used.
WaitTime time.Duration
// Token is used to provide a per-request ACL token
// which overrides the agent's default token.
Token string
}
// TLSConfig is used to generate a TLSClientConfig that's useful for talking to
// Consul using TLS.
type TLSConfig struct {
// Address is the optional address of the Consul server. The port, if any
// will be removed from here and this will be set to the ServerName of the
// resulting config.
Address string
// CAFile is the optional path to the CA certificate used for Consul
// communication, defaults to the system bundle if not specified.
CAFile string
// CertFile is the optional path to the certificate for Consul
// communication. If this is set then you need to also set KeyFile.
CertFile string
// KeyFile is the optional path to the private key for Consul communication.
// If this is set then you need to also set CertFile.
KeyFile string
// InsecureSkipVerify if set to true will disable TLS host verification.
InsecureSkipVerify bool
}
// DefaultConfig returns a default configuration for the client. By default this
// will pool and reuse idle connections to Consul. If you have a long-lived
// client object, this is the desired behavior and should make the most efficient
// use of the connections to Consul. If you don't reuse a client object , which
// is not recommended, then you may notice idle connections building up over
// time. To avoid this, use the DefaultNonPooledConfig() instead.
func DefaultConfig() *Config {
return defaultConfig(cleanhttp.DefaultPooledTransport)
}
// DefaultNonPooledConfig returns a default configuration for the client which
// does not pool connections. This isn't a recommended configuration because it
// will reconnect to Consul on every request, but this is useful to avoid the
// accumulation of idle connections if you make many client objects during the
// lifetime of your application.
func DefaultNonPooledConfig() *Config {
return defaultConfig(cleanhttp.DefaultTransport)
}
// defaultConfig returns the default configuration for the client, using the
// given function to make the transport.
func defaultConfig(transportFn func() *http.Transport) *Config {
config := &Config{
Address: "127.0.0.1:8500",
Scheme: "http",
HttpClient: &http.Client{
Transport: transportFn(),
},
}
if addr := os.Getenv("CONSUL_HTTP_ADDR"); addr != "" {
config.Address = addr
}
if token := os.Getenv("CONSUL_HTTP_TOKEN"); token != "" {
config.Token = token
}
if auth := os.Getenv("CONSUL_HTTP_AUTH"); auth != "" {
var username, password string
if strings.Contains(auth, ":") {
split := strings.SplitN(auth, ":", 2)
username = split[0]
password = split[1]
} else {
username = auth
}
config.HttpAuth = &HttpBasicAuth{
Username: username,
Password: password,
}
}
if ssl := os.Getenv("CONSUL_HTTP_SSL"); ssl != "" {
enabled, err := strconv.ParseBool(ssl)
if err != nil {
log.Printf("[WARN] client: could not parse CONSUL_HTTP_SSL: %s", err)
}
if enabled {
config.Scheme = "https"
}
}
if verify := os.Getenv("CONSUL_HTTP_SSL_VERIFY"); verify != "" {
doVerify, err := strconv.ParseBool(verify)
if err != nil {
log.Printf("[WARN] client: could not parse CONSUL_HTTP_SSL_VERIFY: %s", err)
}
if !doVerify {
tlsClientConfig, err := SetupTLSConfig(&TLSConfig{
InsecureSkipVerify: true,
})
// We don't expect this to fail given that we aren't
// parsing any of the input, but we panic just in case
// since this doesn't have an error return.
if err != nil {
panic(err)
}
transport := transportFn()
transport.TLSClientConfig = tlsClientConfig
config.HttpClient.Transport = transport
}
}
return config
}
// TLSConfig is used to generate a TLSClientConfig that's useful for talking to
// Consul using TLS.
func SetupTLSConfig(tlsConfig *TLSConfig) (*tls.Config, error) {
tlsClientConfig := &tls.Config{
InsecureSkipVerify: tlsConfig.InsecureSkipVerify,
}
if tlsConfig.Address != "" {
server := tlsConfig.Address
hasPort := strings.LastIndex(server, ":") > strings.LastIndex(server, "]")
if hasPort {
var err error
server, _, err = net.SplitHostPort(server)
if err != nil {
return nil, err
}
}
tlsClientConfig.ServerName = server
}
if tlsConfig.CertFile != "" && tlsConfig.KeyFile != "" {
tlsCert, err := tls.LoadX509KeyPair(tlsConfig.CertFile, tlsConfig.KeyFile)
if err != nil {
return nil, err
}
tlsClientConfig.Certificates = []tls.Certificate{tlsCert}
}
if tlsConfig.CAFile != "" {
data, err := ioutil.ReadFile(tlsConfig.CAFile)
if err != nil {
return nil, fmt.Errorf("failed to read CA file: %v", err)
}
caPool := x509.NewCertPool()
if !caPool.AppendCertsFromPEM(data) {
return nil, fmt.Errorf("failed to parse CA certificate")
}
tlsClientConfig.RootCAs = caPool
}
return tlsClientConfig, nil
}
// Client provides a client to the Consul API
type Client struct {
config Config
}
// NewClient returns a new client
func NewClient(config *Config) (*Client, error) {
// bootstrap the config
defConfig := DefaultConfig()
if len(config.Address) == 0 {
config.Address = defConfig.Address
}
if len(config.Scheme) == 0 {
config.Scheme = defConfig.Scheme
}
if config.HttpClient == nil {
config.HttpClient = defConfig.HttpClient
}
if parts := strings.SplitN(config.Address, "unix://", 2); len(parts) == 2 {
trans := cleanhttp.DefaultTransport()
trans.Dial = func(_, _ string) (net.Conn, error) {
return net.Dial("unix", parts[1])
}
config.HttpClient = &http.Client{
Transport: trans,
}
config.Address = parts[1]
}
client := &Client{
config: *config,
}
return client, nil
}
// request is used to help build up a request
type request struct {
config *Config
method string
url *url.URL
params url.Values
body io.Reader
header http.Header
obj interface{}
}
// setQueryOptions is used to annotate the request with
// additional query options
func (r *request) setQueryOptions(q *QueryOptions) {
if q == nil {
return
}
if q.Datacenter != "" {
r.params.Set("dc", q.Datacenter)
}
if q.AllowStale {
r.params.Set("stale", "")
}
if q.RequireConsistent {
r.params.Set("consistent", "")
}
if q.WaitIndex != 0 {
r.params.Set("index", strconv.FormatUint(q.WaitIndex, 10))
}
if q.WaitTime != 0 {
r.params.Set("wait", durToMsec(q.WaitTime))
}
if q.Token != "" {
r.header.Set("X-Consul-Token", q.Token)
}
if q.Near != "" {
r.params.Set("near", q.Near)
}
}
// durToMsec converts a duration to a millisecond specified string. If the
// user selected a positive value that rounds to 0 ms, then we will use 1 ms
// so they get a short delay, otherwise Consul will translate the 0 ms into
// a huge default delay.
func durToMsec(dur time.Duration) string {
ms := dur / time.Millisecond
if dur > 0 && ms == 0 {
ms = 1
}
return fmt.Sprintf("%dms", ms)
}
// serverError is a string we look for to detect 500 errors.
const serverError = "Unexpected response code: 500"
// IsServerError returns true for 500 errors from the Consul servers, these are
// usually retryable at a later time.
func IsServerError(err error) bool {
if err == nil {
return false
}
// TODO (slackpad) - Make a real error type here instead of using
// a string check.
return strings.Contains(err.Error(), serverError)
}
// setWriteOptions is used to annotate the request with
// additional write options
func (r *request) setWriteOptions(q *WriteOptions) {
if q == nil {
return
}
if q.Datacenter != "" {
r.params.Set("dc", q.Datacenter)
}
if q.Token != "" {
r.header.Set("X-Consul-Token", q.Token)
}
}
// toHTTP converts the request to an HTTP request
func (r *request) toHTTP() (*http.Request, error) {
// Encode the query parameters
r.url.RawQuery = r.params.Encode()
// Check if we should encode the body
if r.body == nil && r.obj != nil {
if b, err := encodeBody(r.obj); err != nil {
return nil, err
} else {
r.body = b
}
}
// Create the HTTP request
req, err := http.NewRequest(r.method, r.url.RequestURI(), r.body)
if err != nil {
return nil, err
}
req.URL.Host = r.url.Host
req.URL.Scheme = r.url.Scheme
req.Host = r.url.Host
req.Header = r.header
// Setup auth
if r.config.HttpAuth != nil {
req.SetBasicAuth(r.config.HttpAuth.Username, r.config.HttpAuth.Password)
}
return req, nil
}
// newRequest is used to create a new request
func (c *Client) newRequest(method, path string) *request {
r := &request{
config: &c.config,
method: method,
url: &url.URL{
Scheme: c.config.Scheme,
Host: c.config.Address,
Path: path,
},
params: make(map[string][]string),
header: make(http.Header),
}
if c.config.Datacenter != "" {
r.params.Set("dc", c.config.Datacenter)
}
if c.config.WaitTime != 0 {
r.params.Set("wait", durToMsec(r.config.WaitTime))
}
if c.config.Token != "" {
r.header.Set("X-Consul-Token", r.config.Token)
}
return r
}
// doRequest runs a request with our client
func (c *Client) doRequest(r *request) (time.Duration, *http.Response, error) {
req, err := r.toHTTP()
if err != nil {
return 0, nil, err
}
start := time.Now()
resp, err := c.config.HttpClient.Do(req)
diff := time.Now().Sub(start)
return diff, resp, err
}
// Query is used to do a GET request against an endpoint
// and deserialize the response into an interface using
// standard Consul conventions.
func (c *Client) query(endpoint string, out interface{}, q *QueryOptions) (*QueryMeta, error) {
r := c.newRequest("GET", endpoint)
r.setQueryOptions(q)
rtt, resp, err := requireOK(c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
if err := decodeBody(resp, out); err != nil {
return nil, err
}
return qm, nil
}
// write is used to do a PUT request against an endpoint
// and serialize/deserialized using the standard Consul conventions.
func (c *Client) write(endpoint string, in, out interface{}, q *WriteOptions) (*WriteMeta, error) {
r := c.newRequest("PUT", endpoint)
r.setWriteOptions(q)
r.obj = in
rtt, resp, err := requireOK(c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
wm := &WriteMeta{RequestTime: rtt}
if out != nil {
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
}
return wm, nil
}
// parseQueryMeta is used to help parse query meta-data
func parseQueryMeta(resp *http.Response, q *QueryMeta) error {
header := resp.Header
// Parse the X-Consul-Index
index, err := strconv.ParseUint(header.Get("X-Consul-Index"), 10, 64)
if err != nil {
return fmt.Errorf("Failed to parse X-Consul-Index: %v", err)
}
q.LastIndex = index
// Parse the X-Consul-LastContact
last, err := strconv.ParseUint(header.Get("X-Consul-LastContact"), 10, 64)
if err != nil {
return fmt.Errorf("Failed to parse X-Consul-LastContact: %v", err)
}
q.LastContact = time.Duration(last) * time.Millisecond
// Parse the X-Consul-KnownLeader
switch header.Get("X-Consul-KnownLeader") {
case "true":
q.KnownLeader = true
default:
q.KnownLeader = false
}
// Parse X-Consul-Translate-Addresses
switch header.Get("X-Consul-Translate-Addresses") {
case "true":
q.AddressTranslationEnabled = true
default:
q.AddressTranslationEnabled = false
}
return nil
}
// decodeBody is used to JSON decode a body
func decodeBody(resp *http.Response, out interface{}) error {
dec := json.NewDecoder(resp.Body)
return dec.Decode(out)
}
// encodeBody is used to encode a request body
func encodeBody(obj interface{}) (io.Reader, error) {
buf := bytes.NewBuffer(nil)
enc := json.NewEncoder(buf)
if err := enc.Encode(obj); err != nil {
return nil, err
}
return buf, nil
}
// requireOK is used to wrap doRequest and check for a 200
func requireOK(d time.Duration, resp *http.Response, e error) (time.Duration, *http.Response, error) {
if e != nil {
if resp != nil {
resp.Body.Close()
}
return d, nil, e
}
if resp.StatusCode != 200 {
var buf bytes.Buffer
io.Copy(&buf, resp.Body)
resp.Body.Close()
return d, nil, fmt.Errorf("Unexpected response code: %d (%s)", resp.StatusCode, buf.Bytes())
}
return d, resp, nil
}

View file

@ -0,0 +1,186 @@
package api
type Node struct {
Node string
Address string
TaggedAddresses map[string]string
}
type CatalogService struct {
Node string
Address string
TaggedAddresses map[string]string
ServiceID string
ServiceName string
ServiceAddress string
ServiceTags []string
ServicePort int
ServiceEnableTagOverride bool
}
type CatalogNode struct {
Node *Node
Services map[string]*AgentService
}
type CatalogRegistration struct {
Node string
Address string
TaggedAddresses map[string]string
Datacenter string
Service *AgentService
Check *AgentCheck
}
type CatalogDeregistration struct {
Node string
Address string
Datacenter string
ServiceID string
CheckID string
}
// Catalog can be used to query the Catalog endpoints
type Catalog struct {
c *Client
}
// Catalog returns a handle to the catalog endpoints
func (c *Client) Catalog() *Catalog {
return &Catalog{c}
}
func (c *Catalog) Register(reg *CatalogRegistration, q *WriteOptions) (*WriteMeta, error) {
r := c.c.newRequest("PUT", "/v1/catalog/register")
r.setWriteOptions(q)
r.obj = reg
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, err
}
resp.Body.Close()
wm := &WriteMeta{}
wm.RequestTime = rtt
return wm, nil
}
func (c *Catalog) Deregister(dereg *CatalogDeregistration, q *WriteOptions) (*WriteMeta, error) {
r := c.c.newRequest("PUT", "/v1/catalog/deregister")
r.setWriteOptions(q)
r.obj = dereg
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, err
}
resp.Body.Close()
wm := &WriteMeta{}
wm.RequestTime = rtt
return wm, nil
}
// Datacenters is used to query for all the known datacenters
func (c *Catalog) Datacenters() ([]string, error) {
r := c.c.newRequest("GET", "/v1/catalog/datacenters")
_, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var out []string
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return out, nil
}
// Nodes is used to query all the known nodes
func (c *Catalog) Nodes(q *QueryOptions) ([]*Node, *QueryMeta, error) {
r := c.c.newRequest("GET", "/v1/catalog/nodes")
r.setQueryOptions(q)
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out []*Node
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}
// Services is used to query for all known services
func (c *Catalog) Services(q *QueryOptions) (map[string][]string, *QueryMeta, error) {
r := c.c.newRequest("GET", "/v1/catalog/services")
r.setQueryOptions(q)
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out map[string][]string
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}
// Service is used to query catalog entries for a given service
func (c *Catalog) Service(service, tag string, q *QueryOptions) ([]*CatalogService, *QueryMeta, error) {
r := c.c.newRequest("GET", "/v1/catalog/service/"+service)
r.setQueryOptions(q)
if tag != "" {
r.params.Set("tag", tag)
}
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out []*CatalogService
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}
// Node is used to query for service information about a single node
func (c *Catalog) Node(node string, q *QueryOptions) (*CatalogNode, *QueryMeta, error) {
r := c.c.newRequest("GET", "/v1/catalog/node/"+node)
r.setQueryOptions(q)
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out *CatalogNode
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}

View file

@ -0,0 +1,66 @@
package api
import (
"github.com/hashicorp/serf/coordinate"
)
// CoordinateEntry represents a node and its associated network coordinate.
type CoordinateEntry struct {
Node string
Coord *coordinate.Coordinate
}
// CoordinateDatacenterMap represents a datacenter and its associated WAN
// nodes and their associates coordinates.
type CoordinateDatacenterMap struct {
Datacenter string
Coordinates []CoordinateEntry
}
// Coordinate can be used to query the coordinate endpoints
type Coordinate struct {
c *Client
}
// Coordinate returns a handle to the coordinate endpoints
func (c *Client) Coordinate() *Coordinate {
return &Coordinate{c}
}
// Datacenters is used to return the coordinates of all the servers in the WAN
// pool.
func (c *Coordinate) Datacenters() ([]*CoordinateDatacenterMap, error) {
r := c.c.newRequest("GET", "/v1/coordinate/datacenters")
_, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var out []*CoordinateDatacenterMap
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return out, nil
}
// Nodes is used to return the coordinates of all the nodes in the LAN pool.
func (c *Coordinate) Nodes(q *QueryOptions) ([]*CoordinateEntry, *QueryMeta, error) {
r := c.c.newRequest("GET", "/v1/coordinate/nodes")
r.setQueryOptions(q)
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out []*CoordinateEntry
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}

View file

@ -0,0 +1,104 @@
package api
import (
"bytes"
"strconv"
)
// Event can be used to query the Event endpoints
type Event struct {
c *Client
}
// UserEvent represents an event that was fired by the user
type UserEvent struct {
ID string
Name string
Payload []byte
NodeFilter string
ServiceFilter string
TagFilter string
Version int
LTime uint64
}
// Event returns a handle to the event endpoints
func (c *Client) Event() *Event {
return &Event{c}
}
// Fire is used to fire a new user event. Only the Name, Payload and Filters
// are respected. This returns the ID or an associated error. Cross DC requests
// are supported.
func (e *Event) Fire(params *UserEvent, q *WriteOptions) (string, *WriteMeta, error) {
r := e.c.newRequest("PUT", "/v1/event/fire/"+params.Name)
r.setWriteOptions(q)
if params.NodeFilter != "" {
r.params.Set("node", params.NodeFilter)
}
if params.ServiceFilter != "" {
r.params.Set("service", params.ServiceFilter)
}
if params.TagFilter != "" {
r.params.Set("tag", params.TagFilter)
}
if params.Payload != nil {
r.body = bytes.NewReader(params.Payload)
}
rtt, resp, err := requireOK(e.c.doRequest(r))
if err != nil {
return "", nil, err
}
defer resp.Body.Close()
wm := &WriteMeta{RequestTime: rtt}
var out UserEvent
if err := decodeBody(resp, &out); err != nil {
return "", nil, err
}
return out.ID, wm, nil
}
// List is used to get the most recent events an agent has received.
// This list can be optionally filtered by the name. This endpoint supports
// quasi-blocking queries. The index is not monotonic, nor does it provide provide
// LastContact or KnownLeader.
func (e *Event) List(name string, q *QueryOptions) ([]*UserEvent, *QueryMeta, error) {
r := e.c.newRequest("GET", "/v1/event/list")
r.setQueryOptions(q)
if name != "" {
r.params.Set("name", name)
}
rtt, resp, err := requireOK(e.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var entries []*UserEvent
if err := decodeBody(resp, &entries); err != nil {
return nil, nil, err
}
return entries, qm, nil
}
// IDToIndex is a bit of a hack. This simulates the index generation to
// convert an event ID into a WaitIndex.
func (e *Event) IDToIndex(uuid string) uint64 {
lower := uuid[0:8] + uuid[9:13] + uuid[14:18]
upper := uuid[19:23] + uuid[24:36]
lowVal, err := strconv.ParseUint(lower, 16, 64)
if err != nil {
panic("Failed to convert " + lower)
}
highVal, err := strconv.ParseUint(upper, 16, 64)
if err != nil {
panic("Failed to convert " + upper)
}
return lowVal ^ highVal
}

View file

@ -0,0 +1,144 @@
package api
import (
"fmt"
)
const (
// HealthAny is special, and is used as a wild card,
// not as a specific state.
HealthAny = "any"
HealthPassing = "passing"
HealthWarning = "warning"
HealthCritical = "critical"
)
// HealthCheck is used to represent a single check
type HealthCheck struct {
Node string
CheckID string
Name string
Status string
Notes string
Output string
ServiceID string
ServiceName string
}
// ServiceEntry is used for the health service endpoint
type ServiceEntry struct {
Node *Node
Service *AgentService
Checks []*HealthCheck
}
// Health can be used to query the Health endpoints
type Health struct {
c *Client
}
// Health returns a handle to the health endpoints
func (c *Client) Health() *Health {
return &Health{c}
}
// Node is used to query for checks belonging to a given node
func (h *Health) Node(node string, q *QueryOptions) ([]*HealthCheck, *QueryMeta, error) {
r := h.c.newRequest("GET", "/v1/health/node/"+node)
r.setQueryOptions(q)
rtt, resp, err := requireOK(h.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out []*HealthCheck
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}
// Checks is used to return the checks associated with a service
func (h *Health) Checks(service string, q *QueryOptions) ([]*HealthCheck, *QueryMeta, error) {
r := h.c.newRequest("GET", "/v1/health/checks/"+service)
r.setQueryOptions(q)
rtt, resp, err := requireOK(h.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out []*HealthCheck
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}
// Service is used to query health information along with service info
// for a given service. It can optionally do server-side filtering on a tag
// or nodes with passing health checks only.
func (h *Health) Service(service, tag string, passingOnly bool, q *QueryOptions) ([]*ServiceEntry, *QueryMeta, error) {
r := h.c.newRequest("GET", "/v1/health/service/"+service)
r.setQueryOptions(q)
if tag != "" {
r.params.Set("tag", tag)
}
if passingOnly {
r.params.Set(HealthPassing, "1")
}
rtt, resp, err := requireOK(h.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out []*ServiceEntry
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}
// State is used to retrieve all the checks in a given state.
// The wildcard "any" state can also be used for all checks.
func (h *Health) State(state string, q *QueryOptions) ([]*HealthCheck, *QueryMeta, error) {
switch state {
case HealthAny:
case HealthWarning:
case HealthCritical:
case HealthPassing:
default:
return nil, nil, fmt.Errorf("Unsupported state: %v", state)
}
r := h.c.newRequest("GET", "/v1/health/state/"+state)
r.setQueryOptions(q)
rtt, resp, err := requireOK(h.c.doRequest(r))
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
var out []*HealthCheck
if err := decodeBody(resp, &out); err != nil {
return nil, nil, err
}
return out, qm, nil
}

View file

@ -0,0 +1,418 @@
package api
import (
"bytes"
"fmt"
"io"
"net/http"
"strconv"
"strings"
)
// KVPair is used to represent a single K/V entry
type KVPair struct {
// Key is the name of the key. It is also part of the URL path when accessed
// via the API.
Key string
// CreateIndex holds the index corresponding the creation of this KVPair. This
// is a read-only field.
CreateIndex uint64
// ModifyIndex is used for the Check-And-Set operations and can also be fed
// back into the WaitIndex of the QueryOptions in order to perform blocking
// queries.
ModifyIndex uint64
// LockIndex holds the index corresponding to a lock on this key, if any. This
// is a read-only field.
LockIndex uint64
// Flags are any user-defined flags on the key. It is up to the implementer
// to check these values, since Consul does not treat them specially.
Flags uint64
// Value is the value for the key. This can be any value, but it will be
// base64 encoded upon transport.
Value []byte
// Session is a string representing the ID of the session. Any other
// interactions with this key over the same session must specify the same
// session ID.
Session string
}
// KVPairs is a list of KVPair objects
type KVPairs []*KVPair
// KVOp constants give possible operations available in a KVTxn.
type KVOp string
const (
KVSet KVOp = "set"
KVDelete = "delete"
KVDeleteCAS = "delete-cas"
KVDeleteTree = "delete-tree"
KVCAS = "cas"
KVLock = "lock"
KVUnlock = "unlock"
KVGet = "get"
KVGetTree = "get-tree"
KVCheckSession = "check-session"
KVCheckIndex = "check-index"
)
// KVTxnOp defines a single operation inside a transaction.
type KVTxnOp struct {
Verb string
Key string
Value []byte
Flags uint64
Index uint64
Session string
}
// KVTxnOps defines a set of operations to be performed inside a single
// transaction.
type KVTxnOps []*KVTxnOp
// KVTxnResponse has the outcome of a transaction.
type KVTxnResponse struct {
Results []*KVPair
Errors TxnErrors
}
// KV is used to manipulate the K/V API
type KV struct {
c *Client
}
// KV is used to return a handle to the K/V apis
func (c *Client) KV() *KV {
return &KV{c}
}
// Get is used to lookup a single key
func (k *KV) Get(key string, q *QueryOptions) (*KVPair, *QueryMeta, error) {
resp, qm, err := k.getInternal(key, nil, q)
if err != nil {
return nil, nil, err
}
if resp == nil {
return nil, qm, nil
}
defer resp.Body.Close()
var entries []*KVPair
if err := decodeBody(resp, &entries); err != nil {
return nil, nil, err
}
if len(entries) > 0 {
return entries[0], qm, nil
}
return nil, qm, nil
}
// List is used to lookup all keys under a prefix
func (k *KV) List(prefix string, q *QueryOptions) (KVPairs, *QueryMeta, error) {
resp, qm, err := k.getInternal(prefix, map[string]string{"recurse": ""}, q)
if err != nil {
return nil, nil, err
}
if resp == nil {
return nil, qm, nil
}
defer resp.Body.Close()
var entries []*KVPair
if err := decodeBody(resp, &entries); err != nil {
return nil, nil, err
}
return entries, qm, nil
}
// Keys is used to list all the keys under a prefix. Optionally,
// a separator can be used to limit the responses.
func (k *KV) Keys(prefix, separator string, q *QueryOptions) ([]string, *QueryMeta, error) {
params := map[string]string{"keys": ""}
if separator != "" {
params["separator"] = separator
}
resp, qm, err := k.getInternal(prefix, params, q)
if err != nil {
return nil, nil, err
}
if resp == nil {
return nil, qm, nil
}
defer resp.Body.Close()
var entries []string
if err := decodeBody(resp, &entries); err != nil {
return nil, nil, err
}
return entries, qm, nil
}
func (k *KV) getInternal(key string, params map[string]string, q *QueryOptions) (*http.Response, *QueryMeta, error) {
r := k.c.newRequest("GET", "/v1/kv/"+key)
r.setQueryOptions(q)
for param, val := range params {
r.params.Set(param, val)
}
rtt, resp, err := k.c.doRequest(r)
if err != nil {
return nil, nil, err
}
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
if resp.StatusCode == 404 {
resp.Body.Close()
return nil, qm, nil
} else if resp.StatusCode != 200 {
resp.Body.Close()
return nil, nil, fmt.Errorf("Unexpected response code: %d", resp.StatusCode)
}
return resp, qm, nil
}
// Put is used to write a new value. Only the
// Key, Flags and Value is respected.
func (k *KV) Put(p *KVPair, q *WriteOptions) (*WriteMeta, error) {
params := make(map[string]string, 1)
if p.Flags != 0 {
params["flags"] = strconv.FormatUint(p.Flags, 10)
}
_, wm, err := k.put(p.Key, params, p.Value, q)
return wm, err
}
// CAS is used for a Check-And-Set operation. The Key,
// ModifyIndex, Flags and Value are respected. Returns true
// on success or false on failures.
func (k *KV) CAS(p *KVPair, q *WriteOptions) (bool, *WriteMeta, error) {
params := make(map[string]string, 2)
if p.Flags != 0 {
params["flags"] = strconv.FormatUint(p.Flags, 10)
}
params["cas"] = strconv.FormatUint(p.ModifyIndex, 10)
return k.put(p.Key, params, p.Value, q)
}
// Acquire is used for a lock acquisition operation. The Key,
// Flags, Value and Session are respected. Returns true
// on success or false on failures.
func (k *KV) Acquire(p *KVPair, q *WriteOptions) (bool, *WriteMeta, error) {
params := make(map[string]string, 2)
if p.Flags != 0 {
params["flags"] = strconv.FormatUint(p.Flags, 10)
}
params["acquire"] = p.Session
return k.put(p.Key, params, p.Value, q)
}
// Release is used for a lock release operation. The Key,
// Flags, Value and Session are respected. Returns true
// on success or false on failures.
func (k *KV) Release(p *KVPair, q *WriteOptions) (bool, *WriteMeta, error) {
params := make(map[string]string, 2)
if p.Flags != 0 {
params["flags"] = strconv.FormatUint(p.Flags, 10)
}
params["release"] = p.Session
return k.put(p.Key, params, p.Value, q)
}
func (k *KV) put(key string, params map[string]string, body []byte, q *WriteOptions) (bool, *WriteMeta, error) {
if len(key) > 0 && key[0] == '/' {
return false, nil, fmt.Errorf("Invalid key. Key must not begin with a '/': %s", key)
}
r := k.c.newRequest("PUT", "/v1/kv/"+key)
r.setWriteOptions(q)
for param, val := range params {
r.params.Set(param, val)
}
r.body = bytes.NewReader(body)
rtt, resp, err := requireOK(k.c.doRequest(r))
if err != nil {
return false, nil, err
}
defer resp.Body.Close()
qm := &WriteMeta{}
qm.RequestTime = rtt
var buf bytes.Buffer
if _, err := io.Copy(&buf, resp.Body); err != nil {
return false, nil, fmt.Errorf("Failed to read response: %v", err)
}
res := strings.Contains(string(buf.Bytes()), "true")
return res, qm, nil
}
// Delete is used to delete a single key
func (k *KV) Delete(key string, w *WriteOptions) (*WriteMeta, error) {
_, qm, err := k.deleteInternal(key, nil, w)
return qm, err
}
// DeleteCAS is used for a Delete Check-And-Set operation. The Key
// and ModifyIndex are respected. Returns true on success or false on failures.
func (k *KV) DeleteCAS(p *KVPair, q *WriteOptions) (bool, *WriteMeta, error) {
params := map[string]string{
"cas": strconv.FormatUint(p.ModifyIndex, 10),
}
return k.deleteInternal(p.Key, params, q)
}
// DeleteTree is used to delete all keys under a prefix
func (k *KV) DeleteTree(prefix string, w *WriteOptions) (*WriteMeta, error) {
_, qm, err := k.deleteInternal(prefix, map[string]string{"recurse": ""}, w)
return qm, err
}
func (k *KV) deleteInternal(key string, params map[string]string, q *WriteOptions) (bool, *WriteMeta, error) {
r := k.c.newRequest("DELETE", "/v1/kv/"+key)
r.setWriteOptions(q)
for param, val := range params {
r.params.Set(param, val)
}
rtt, resp, err := requireOK(k.c.doRequest(r))
if err != nil {
return false, nil, err
}
defer resp.Body.Close()
qm := &WriteMeta{}
qm.RequestTime = rtt
var buf bytes.Buffer
if _, err := io.Copy(&buf, resp.Body); err != nil {
return false, nil, fmt.Errorf("Failed to read response: %v", err)
}
res := strings.Contains(string(buf.Bytes()), "true")
return res, qm, nil
}
// TxnOp is the internal format we send to Consul. It's not specific to KV,
// though currently only KV operations are supported.
type TxnOp struct {
KV *KVTxnOp
}
// TxnOps is a list of transaction operations.
type TxnOps []*TxnOp
// TxnResult is the internal format we receive from Consul.
type TxnResult struct {
KV *KVPair
}
// TxnResults is a list of TxnResult objects.
type TxnResults []*TxnResult
// TxnError is used to return information about an operation in a transaction.
type TxnError struct {
OpIndex int
What string
}
// TxnErrors is a list of TxnError objects.
type TxnErrors []*TxnError
// TxnResponse is the internal format we receive from Consul.
type TxnResponse struct {
Results TxnResults
Errors TxnErrors
}
// Txn is used to apply multiple KV operations in a single, atomic transaction.
//
// Note that Go will perform the required base64 encoding on the values
// automatically because the type is a byte slice. Transactions are defined as a
// list of operations to perform, using the KVOp constants and KVTxnOp structure
// to define operations. If any operation fails, none of the changes are applied
// to the state store. Note that this hides the internal raw transaction interface
// and munges the input and output types into KV-specific ones for ease of use.
// If there are more non-KV operations in the future we may break out a new
// transaction API client, but it will be easy to keep this KV-specific variant
// supported.
//
// Even though this is generally a write operation, we take a QueryOptions input
// and return a QueryMeta output. If the transaction contains only read ops, then
// Consul will fast-path it to a different endpoint internally which supports
// consistency controls, but not blocking. If there are write operations then
// the request will always be routed through raft and any consistency settings
// will be ignored.
//
// Here's an example:
//
// ops := KVTxnOps{
// &KVTxnOp{
// Verb: KVLock,
// Key: "test/lock",
// Session: "adf4238a-882b-9ddc-4a9d-5b6758e4159e",
// Value: []byte("hello"),
// },
// &KVTxnOp{
// Verb: KVGet,
// Key: "another/key",
// },
// }
// ok, response, _, err := kv.Txn(&ops, nil)
//
// If there is a problem making the transaction request then an error will be
// returned. Otherwise, the ok value will be true if the transaction succeeded
// or false if it was rolled back. The response is a structured return value which
// will have the outcome of the transaction. Its Results member will have entries
// for each operation. Deleted keys will have a nil entry in the, and to save
// space, the Value of each key in the Results will be nil unless the operation
// is a KVGet. If the transaction was rolled back, the Errors member will have
// entries referencing the index of the operation that failed along with an error
// message.
func (k *KV) Txn(txn KVTxnOps, q *QueryOptions) (bool, *KVTxnResponse, *QueryMeta, error) {
r := k.c.newRequest("PUT", "/v1/txn")
r.setQueryOptions(q)
// Convert into the internal format since this is an all-KV txn.
ops := make(TxnOps, 0, len(txn))
for _, kvOp := range txn {
ops = append(ops, &TxnOp{KV: kvOp})
}
r.obj = ops
rtt, resp, err := k.c.doRequest(r)
if err != nil {
return false, nil, nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
if resp.StatusCode == http.StatusOK || resp.StatusCode == http.StatusConflict {
var txnResp TxnResponse
if err := decodeBody(resp, &txnResp); err != nil {
return false, nil, nil, err
}
// Convert from the internal format.
kvResp := KVTxnResponse{
Errors: txnResp.Errors,
}
for _, result := range txnResp.Results {
kvResp.Results = append(kvResp.Results, result.KV)
}
return resp.StatusCode == http.StatusOK, &kvResp, qm, nil
}
var buf bytes.Buffer
if _, err := io.Copy(&buf, resp.Body); err != nil {
return false, nil, nil, fmt.Errorf("Failed to read response: %v", err)
}
return false, nil, nil, fmt.Errorf("Failed request: %s", buf.String())
}

View file

@ -0,0 +1,384 @@
package api
import (
"fmt"
"sync"
"time"
)
const (
// DefaultLockSessionName is the Session Name we assign if none is provided
DefaultLockSessionName = "Consul API Lock"
// DefaultLockSessionTTL is the default session TTL if no Session is provided
// when creating a new Lock. This is used because we do not have another
// other check to depend upon.
DefaultLockSessionTTL = "15s"
// DefaultLockWaitTime is how long we block for at a time to check if lock
// acquisition is possible. This affects the minimum time it takes to cancel
// a Lock acquisition.
DefaultLockWaitTime = 15 * time.Second
// DefaultLockRetryTime is how long we wait after a failed lock acquisition
// before attempting to do the lock again. This is so that once a lock-delay
// is in effect, we do not hot loop retrying the acquisition.
DefaultLockRetryTime = 5 * time.Second
// DefaultMonitorRetryTime is how long we wait after a failed monitor check
// of a lock (500 response code). This allows the monitor to ride out brief
// periods of unavailability, subject to the MonitorRetries setting in the
// lock options which is by default set to 0, disabling this feature. This
// affects locks and semaphores.
DefaultMonitorRetryTime = 2 * time.Second
// LockFlagValue is a magic flag we set to indicate a key
// is being used for a lock. It is used to detect a potential
// conflict with a semaphore.
LockFlagValue = 0x2ddccbc058a50c18
)
var (
// ErrLockHeld is returned if we attempt to double lock
ErrLockHeld = fmt.Errorf("Lock already held")
// ErrLockNotHeld is returned if we attempt to unlock a lock
// that we do not hold.
ErrLockNotHeld = fmt.Errorf("Lock not held")
// ErrLockInUse is returned if we attempt to destroy a lock
// that is in use.
ErrLockInUse = fmt.Errorf("Lock in use")
// ErrLockConflict is returned if the flags on a key
// used for a lock do not match expectation
ErrLockConflict = fmt.Errorf("Existing key does not match lock use")
)
// Lock is used to implement client-side leader election. It is follows the
// algorithm as described here: https://www.consul.io/docs/guides/leader-election.html.
type Lock struct {
c *Client
opts *LockOptions
isHeld bool
sessionRenew chan struct{}
lockSession string
l sync.Mutex
}
// LockOptions is used to parameterize the Lock behavior.
type LockOptions struct {
Key string // Must be set and have write permissions
Value []byte // Optional, value to associate with the lock
Session string // Optional, created if not specified
SessionOpts *SessionEntry // Optional, options to use when creating a session
SessionName string // Optional, defaults to DefaultLockSessionName (ignored if SessionOpts is given)
SessionTTL string // Optional, defaults to DefaultLockSessionTTL (ignored if SessionOpts is given)
MonitorRetries int // Optional, defaults to 0 which means no retries
MonitorRetryTime time.Duration // Optional, defaults to DefaultMonitorRetryTime
LockWaitTime time.Duration // Optional, defaults to DefaultLockWaitTime
LockTryOnce bool // Optional, defaults to false which means try forever
}
// LockKey returns a handle to a lock struct which can be used
// to acquire and release the mutex. The key used must have
// write permissions.
func (c *Client) LockKey(key string) (*Lock, error) {
opts := &LockOptions{
Key: key,
}
return c.LockOpts(opts)
}
// LockOpts returns a handle to a lock struct which can be used
// to acquire and release the mutex. The key used must have
// write permissions.
func (c *Client) LockOpts(opts *LockOptions) (*Lock, error) {
if opts.Key == "" {
return nil, fmt.Errorf("missing key")
}
if opts.SessionName == "" {
opts.SessionName = DefaultLockSessionName
}
if opts.SessionTTL == "" {
opts.SessionTTL = DefaultLockSessionTTL
} else {
if _, err := time.ParseDuration(opts.SessionTTL); err != nil {
return nil, fmt.Errorf("invalid SessionTTL: %v", err)
}
}
if opts.MonitorRetryTime == 0 {
opts.MonitorRetryTime = DefaultMonitorRetryTime
}
if opts.LockWaitTime == 0 {
opts.LockWaitTime = DefaultLockWaitTime
}
l := &Lock{
c: c,
opts: opts,
}
return l, nil
}
// Lock attempts to acquire the lock and blocks while doing so.
// Providing a non-nil stopCh can be used to abort the lock attempt.
// Returns a channel that is closed if our lock is lost or an error.
// This channel could be closed at any time due to session invalidation,
// communication errors, operator intervention, etc. It is NOT safe to
// assume that the lock is held until Unlock() unless the Session is specifically
// created without any associated health checks. By default Consul sessions
// prefer liveness over safety and an application must be able to handle
// the lock being lost.
func (l *Lock) Lock(stopCh <-chan struct{}) (<-chan struct{}, error) {
// Hold the lock as we try to acquire
l.l.Lock()
defer l.l.Unlock()
// Check if we already hold the lock
if l.isHeld {
return nil, ErrLockHeld
}
// Check if we need to create a session first
l.lockSession = l.opts.Session
if l.lockSession == "" {
if s, err := l.createSession(); err != nil {
return nil, fmt.Errorf("failed to create session: %v", err)
} else {
l.sessionRenew = make(chan struct{})
l.lockSession = s
session := l.c.Session()
go session.RenewPeriodic(l.opts.SessionTTL, s, nil, l.sessionRenew)
// If we fail to acquire the lock, cleanup the session
defer func() {
if !l.isHeld {
close(l.sessionRenew)
l.sessionRenew = nil
}
}()
}
}
// Setup the query options
kv := l.c.KV()
qOpts := &QueryOptions{
WaitTime: l.opts.LockWaitTime,
}
start := time.Now()
attempts := 0
WAIT:
// Check if we should quit
select {
case <-stopCh:
return nil, nil
default:
}
// Handle the one-shot mode.
if l.opts.LockTryOnce && attempts > 0 {
elapsed := time.Now().Sub(start)
if elapsed > qOpts.WaitTime {
return nil, nil
}
qOpts.WaitTime -= elapsed
}
attempts++
// Look for an existing lock, blocking until not taken
pair, meta, err := kv.Get(l.opts.Key, qOpts)
if err != nil {
return nil, fmt.Errorf("failed to read lock: %v", err)
}
if pair != nil && pair.Flags != LockFlagValue {
return nil, ErrLockConflict
}
locked := false
if pair != nil && pair.Session == l.lockSession {
goto HELD
}
if pair != nil && pair.Session != "" {
qOpts.WaitIndex = meta.LastIndex
goto WAIT
}
// Try to acquire the lock
pair = l.lockEntry(l.lockSession)
locked, _, err = kv.Acquire(pair, nil)
if err != nil {
return nil, fmt.Errorf("failed to acquire lock: %v", err)
}
// Handle the case of not getting the lock
if !locked {
// Determine why the lock failed
qOpts.WaitIndex = 0
pair, meta, err = kv.Get(l.opts.Key, qOpts)
if pair != nil && pair.Session != "" {
//If the session is not null, this means that a wait can safely happen
//using a long poll
qOpts.WaitIndex = meta.LastIndex
goto WAIT
} else {
// If the session is empty and the lock failed to acquire, then it means
// a lock-delay is in effect and a timed wait must be used
select {
case <-time.After(DefaultLockRetryTime):
goto WAIT
case <-stopCh:
return nil, nil
}
}
}
HELD:
// Watch to ensure we maintain leadership
leaderCh := make(chan struct{})
go l.monitorLock(l.lockSession, leaderCh)
// Set that we own the lock
l.isHeld = true
// Locked! All done
return leaderCh, nil
}
// Unlock released the lock. It is an error to call this
// if the lock is not currently held.
func (l *Lock) Unlock() error {
// Hold the lock as we try to release
l.l.Lock()
defer l.l.Unlock()
// Ensure the lock is actually held
if !l.isHeld {
return ErrLockNotHeld
}
// Set that we no longer own the lock
l.isHeld = false
// Stop the session renew
if l.sessionRenew != nil {
defer func() {
close(l.sessionRenew)
l.sessionRenew = nil
}()
}
// Get the lock entry, and clear the lock session
lockEnt := l.lockEntry(l.lockSession)
l.lockSession = ""
// Release the lock explicitly
kv := l.c.KV()
_, _, err := kv.Release(lockEnt, nil)
if err != nil {
return fmt.Errorf("failed to release lock: %v", err)
}
return nil
}
// Destroy is used to cleanup the lock entry. It is not necessary
// to invoke. It will fail if the lock is in use.
func (l *Lock) Destroy() error {
// Hold the lock as we try to release
l.l.Lock()
defer l.l.Unlock()
// Check if we already hold the lock
if l.isHeld {
return ErrLockHeld
}
// Look for an existing lock
kv := l.c.KV()
pair, _, err := kv.Get(l.opts.Key, nil)
if err != nil {
return fmt.Errorf("failed to read lock: %v", err)
}
// Nothing to do if the lock does not exist
if pair == nil {
return nil
}
// Check for possible flag conflict
if pair.Flags != LockFlagValue {
return ErrLockConflict
}
// Check if it is in use
if pair.Session != "" {
return ErrLockInUse
}
// Attempt the delete
didRemove, _, err := kv.DeleteCAS(pair, nil)
if err != nil {
return fmt.Errorf("failed to remove lock: %v", err)
}
if !didRemove {
return ErrLockInUse
}
return nil
}
// createSession is used to create a new managed session
func (l *Lock) createSession() (string, error) {
session := l.c.Session()
se := l.opts.SessionOpts
if se == nil {
se = &SessionEntry{
Name: l.opts.SessionName,
TTL: l.opts.SessionTTL,
}
}
id, _, err := session.Create(se, nil)
if err != nil {
return "", err
}
return id, nil
}
// lockEntry returns a formatted KVPair for the lock
func (l *Lock) lockEntry(session string) *KVPair {
return &KVPair{
Key: l.opts.Key,
Value: l.opts.Value,
Session: session,
Flags: LockFlagValue,
}
}
// monitorLock is a long running routine to monitor a lock ownership
// It closes the stopCh if we lose our leadership.
func (l *Lock) monitorLock(session string, stopCh chan struct{}) {
defer close(stopCh)
kv := l.c.KV()
opts := &QueryOptions{RequireConsistent: true}
WAIT:
retries := l.opts.MonitorRetries
RETRY:
pair, meta, err := kv.Get(l.opts.Key, opts)
if err != nil {
// If configured we can try to ride out a brief Consul unavailability
// by doing retries. Note that we have to attempt the retry in a non-
// blocking fashion so that we have a clean place to reset the retry
// counter if service is restored.
if retries > 0 && IsServerError(err) {
time.Sleep(l.opts.MonitorRetryTime)
retries--
opts.WaitIndex = 0
goto RETRY
}
return
}
if pair != nil && pair.Session == session {
opts.WaitIndex = meta.LastIndex
goto WAIT
}
}

View file

@ -0,0 +1,81 @@
package api
// Operator can be used to perform low-level operator tasks for Consul.
type Operator struct {
c *Client
}
// Operator returns a handle to the operator endpoints.
func (c *Client) Operator() *Operator {
return &Operator{c}
}
// RaftServer has information about a server in the Raft configuration.
type RaftServer struct {
// ID is the unique ID for the server. These are currently the same
// as the address, but they will be changed to a real GUID in a future
// release of Consul.
ID string
// Node is the node name of the server, as known by Consul, or this
// will be set to "(unknown)" otherwise.
Node string
// Address is the IP:port of the server, used for Raft communications.
Address string
// Leader is true if this server is the current cluster leader.
Leader bool
// Voter is true if this server has a vote in the cluster. This might
// be false if the server is staging and still coming online, or if
// it's a non-voting server, which will be added in a future release of
// Consul.
Voter bool
}
// RaftConfigration is returned when querying for the current Raft configuration.
type RaftConfiguration struct {
// Servers has the list of servers in the Raft configuration.
Servers []*RaftServer
// Index has the Raft index of this configuration.
Index uint64
}
// RaftGetConfiguration is used to query the current Raft peer set.
func (op *Operator) RaftGetConfiguration(q *QueryOptions) (*RaftConfiguration, error) {
r := op.c.newRequest("GET", "/v1/operator/raft/configuration")
r.setQueryOptions(q)
_, resp, err := requireOK(op.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var out RaftConfiguration
if err := decodeBody(resp, &out); err != nil {
return nil, err
}
return &out, nil
}
// RaftRemovePeerByAddress is used to kick a stale peer (one that it in the Raft
// quorum but no longer known to Serf or the catalog) by address in the form of
// "IP:port".
func (op *Operator) RaftRemovePeerByAddress(address string, q *WriteOptions) error {
r := op.c.newRequest("DELETE", "/v1/operator/raft/peer")
r.setWriteOptions(q)
// TODO (slackpad) Currently we made address a query parameter. Once
// IDs are in place this will be DELETE /v1/operator/raft/peer/<id>.
r.params.Set("address", string(address))
_, resp, err := requireOK(op.c.doRequest(r))
if err != nil {
return err
}
resp.Body.Close()
return nil
}

View file

@ -0,0 +1,194 @@
package api
// QueryDatacenterOptions sets options about how we fail over if there are no
// healthy nodes in the local datacenter.
type QueryDatacenterOptions struct {
// NearestN is set to the number of remote datacenters to try, based on
// network coordinates.
NearestN int
// Datacenters is a fixed list of datacenters to try after NearestN. We
// never try a datacenter multiple times, so those are subtracted from
// this list before proceeding.
Datacenters []string
}
// QueryDNSOptions controls settings when query results are served over DNS.
type QueryDNSOptions struct {
// TTL is the time to live for the served DNS results.
TTL string
}
// ServiceQuery is used to query for a set of healthy nodes offering a specific
// service.
type ServiceQuery struct {
// Service is the service to query.
Service string
// Near allows baking in the name of a node to automatically distance-
// sort from. The magic "_agent" value is supported, which sorts near
// the agent which initiated the request by default.
Near string
// Failover controls what we do if there are no healthy nodes in the
// local datacenter.
Failover QueryDatacenterOptions
// If OnlyPassing is true then we will only include nodes with passing
// health checks (critical AND warning checks will cause a node to be
// discarded)
OnlyPassing bool
// Tags are a set of required and/or disallowed tags. If a tag is in
// this list it must be present. If the tag is preceded with "!" then
// it is disallowed.
Tags []string
}
// QueryTemplate carries the arguments for creating a templated query.
type QueryTemplate struct {
// Type specifies the type of the query template. Currently only
// "name_prefix_match" is supported. This field is required.
Type string
// Regexp allows specifying a regex pattern to match against the name
// of the query being executed.
Regexp string
}
// PrepatedQueryDefinition defines a complete prepared query.
type PreparedQueryDefinition struct {
// ID is this UUID-based ID for the query, always generated by Consul.
ID string
// Name is an optional friendly name for the query supplied by the
// user. NOTE - if this feature is used then it will reduce the security
// of any read ACL associated with this query/service since this name
// can be used to locate nodes with supplying any ACL.
Name string
// Session is an optional session to tie this query's lifetime to. If
// this is omitted then the query will not expire.
Session string
// Token is the ACL token used when the query was created, and it is
// used when a query is subsequently executed. This token, or a token
// with management privileges, must be used to change the query later.
Token string
// Service defines a service query (leaving things open for other types
// later).
Service ServiceQuery
// DNS has options that control how the results of this query are
// served over DNS.
DNS QueryDNSOptions
// Template is used to pass through the arguments for creating a
// prepared query with an attached template. If a template is given,
// interpolations are possible in other struct fields.
Template QueryTemplate
}
// PreparedQueryExecuteResponse has the results of executing a query.
type PreparedQueryExecuteResponse struct {
// Service is the service that was queried.
Service string
// Nodes has the nodes that were output by the query.
Nodes []ServiceEntry
// DNS has the options for serving these results over DNS.
DNS QueryDNSOptions
// Datacenter is the datacenter that these results came from.
Datacenter string
// Failovers is a count of how many times we had to query a remote
// datacenter.
Failovers int
}
// PreparedQuery can be used to query the prepared query endpoints.
type PreparedQuery struct {
c *Client
}
// PreparedQuery returns a handle to the prepared query endpoints.
func (c *Client) PreparedQuery() *PreparedQuery {
return &PreparedQuery{c}
}
// Create makes a new prepared query. The ID of the new query is returned.
func (c *PreparedQuery) Create(query *PreparedQueryDefinition, q *WriteOptions) (string, *WriteMeta, error) {
r := c.c.newRequest("POST", "/v1/query")
r.setWriteOptions(q)
r.obj = query
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return "", nil, err
}
defer resp.Body.Close()
wm := &WriteMeta{}
wm.RequestTime = rtt
var out struct{ ID string }
if err := decodeBody(resp, &out); err != nil {
return "", nil, err
}
return out.ID, wm, nil
}
// Update makes updates to an existing prepared query.
func (c *PreparedQuery) Update(query *PreparedQueryDefinition, q *WriteOptions) (*WriteMeta, error) {
return c.c.write("/v1/query/"+query.ID, query, nil, q)
}
// List is used to fetch all the prepared queries (always requires a management
// token).
func (c *PreparedQuery) List(q *QueryOptions) ([]*PreparedQueryDefinition, *QueryMeta, error) {
var out []*PreparedQueryDefinition
qm, err := c.c.query("/v1/query", &out, q)
if err != nil {
return nil, nil, err
}
return out, qm, nil
}
// Get is used to fetch a specific prepared query.
func (c *PreparedQuery) Get(queryID string, q *QueryOptions) ([]*PreparedQueryDefinition, *QueryMeta, error) {
var out []*PreparedQueryDefinition
qm, err := c.c.query("/v1/query/"+queryID, &out, q)
if err != nil {
return nil, nil, err
}
return out, qm, nil
}
// Delete is used to delete a specific prepared query.
func (c *PreparedQuery) Delete(queryID string, q *QueryOptions) (*QueryMeta, error) {
r := c.c.newRequest("DELETE", "/v1/query/"+queryID)
r.setQueryOptions(q)
rtt, resp, err := requireOK(c.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
qm := &QueryMeta{}
parseQueryMeta(resp, qm)
qm.RequestTime = rtt
return qm, nil
}
// Execute is used to execute a specific prepared query. You can execute using
// a query ID or name.
func (c *PreparedQuery) Execute(queryIDOrName string, q *QueryOptions) (*PreparedQueryExecuteResponse, *QueryMeta, error) {
var out *PreparedQueryExecuteResponse
qm, err := c.c.query("/v1/query/"+queryIDOrName+"/execute", &out, q)
if err != nil {
return nil, nil, err
}
return out, qm, nil
}

View file

@ -0,0 +1,24 @@
package api
// Raw can be used to do raw queries against custom endpoints
type Raw struct {
c *Client
}
// Raw returns a handle to query endpoints
func (c *Client) Raw() *Raw {
return &Raw{c}
}
// Query is used to do a GET request against an endpoint
// and deserialize the response into an interface using
// standard Consul conventions.
func (raw *Raw) Query(endpoint string, out interface{}, q *QueryOptions) (*QueryMeta, error) {
return raw.c.query(endpoint, out, q)
}
// Write is used to do a PUT request against an endpoint
// and serialize/deserialized using the standard Consul conventions.
func (raw *Raw) Write(endpoint string, in, out interface{}, q *WriteOptions) (*WriteMeta, error) {
return raw.c.write(endpoint, in, out, q)
}

View file

@ -0,0 +1,512 @@
package api
import (
"encoding/json"
"fmt"
"path"
"sync"
"time"
)
const (
// DefaultSemaphoreSessionName is the Session Name we assign if none is provided
DefaultSemaphoreSessionName = "Consul API Semaphore"
// DefaultSemaphoreSessionTTL is the default session TTL if no Session is provided
// when creating a new Semaphore. This is used because we do not have another
// other check to depend upon.
DefaultSemaphoreSessionTTL = "15s"
// DefaultSemaphoreWaitTime is how long we block for at a time to check if semaphore
// acquisition is possible. This affects the minimum time it takes to cancel
// a Semaphore acquisition.
DefaultSemaphoreWaitTime = 15 * time.Second
// DefaultSemaphoreKey is the key used within the prefix to
// use for coordination between all the contenders.
DefaultSemaphoreKey = ".lock"
// SemaphoreFlagValue is a magic flag we set to indicate a key
// is being used for a semaphore. It is used to detect a potential
// conflict with a lock.
SemaphoreFlagValue = 0xe0f69a2baa414de0
)
var (
// ErrSemaphoreHeld is returned if we attempt to double lock
ErrSemaphoreHeld = fmt.Errorf("Semaphore already held")
// ErrSemaphoreNotHeld is returned if we attempt to unlock a semaphore
// that we do not hold.
ErrSemaphoreNotHeld = fmt.Errorf("Semaphore not held")
// ErrSemaphoreInUse is returned if we attempt to destroy a semaphore
// that is in use.
ErrSemaphoreInUse = fmt.Errorf("Semaphore in use")
// ErrSemaphoreConflict is returned if the flags on a key
// used for a semaphore do not match expectation
ErrSemaphoreConflict = fmt.Errorf("Existing key does not match semaphore use")
)
// Semaphore is used to implement a distributed semaphore
// using the Consul KV primitives.
type Semaphore struct {
c *Client
opts *SemaphoreOptions
isHeld bool
sessionRenew chan struct{}
lockSession string
l sync.Mutex
}
// SemaphoreOptions is used to parameterize the Semaphore
type SemaphoreOptions struct {
Prefix string // Must be set and have write permissions
Limit int // Must be set, and be positive
Value []byte // Optional, value to associate with the contender entry
Session string // Optional, created if not specified
SessionName string // Optional, defaults to DefaultLockSessionName
SessionTTL string // Optional, defaults to DefaultLockSessionTTL
MonitorRetries int // Optional, defaults to 0 which means no retries
MonitorRetryTime time.Duration // Optional, defaults to DefaultMonitorRetryTime
SemaphoreWaitTime time.Duration // Optional, defaults to DefaultSemaphoreWaitTime
SemaphoreTryOnce bool // Optional, defaults to false which means try forever
}
// semaphoreLock is written under the DefaultSemaphoreKey and
// is used to coordinate between all the contenders.
type semaphoreLock struct {
// Limit is the integer limit of holders. This is used to
// verify that all the holders agree on the value.
Limit int
// Holders is a list of all the semaphore holders.
// It maps the session ID to true. It is used as a set effectively.
Holders map[string]bool
}
// SemaphorePrefix is used to created a Semaphore which will operate
// at the given KV prefix and uses the given limit for the semaphore.
// The prefix must have write privileges, and the limit must be agreed
// upon by all contenders.
func (c *Client) SemaphorePrefix(prefix string, limit int) (*Semaphore, error) {
opts := &SemaphoreOptions{
Prefix: prefix,
Limit: limit,
}
return c.SemaphoreOpts(opts)
}
// SemaphoreOpts is used to create a Semaphore with the given options.
// The prefix must have write privileges, and the limit must be agreed
// upon by all contenders. If a Session is not provided, one will be created.
func (c *Client) SemaphoreOpts(opts *SemaphoreOptions) (*Semaphore, error) {
if opts.Prefix == "" {
return nil, fmt.Errorf("missing prefix")
}
if opts.Limit <= 0 {
return nil, fmt.Errorf("semaphore limit must be positive")
}
if opts.SessionName == "" {
opts.SessionName = DefaultSemaphoreSessionName
}
if opts.SessionTTL == "" {
opts.SessionTTL = DefaultSemaphoreSessionTTL
} else {
if _, err := time.ParseDuration(opts.SessionTTL); err != nil {
return nil, fmt.Errorf("invalid SessionTTL: %v", err)
}
}
if opts.MonitorRetryTime == 0 {
opts.MonitorRetryTime = DefaultMonitorRetryTime
}
if opts.SemaphoreWaitTime == 0 {
opts.SemaphoreWaitTime = DefaultSemaphoreWaitTime
}
s := &Semaphore{
c: c,
opts: opts,
}
return s, nil
}
// Acquire attempts to reserve a slot in the semaphore, blocking until
// success, interrupted via the stopCh or an error is encountered.
// Providing a non-nil stopCh can be used to abort the attempt.
// On success, a channel is returned that represents our slot.
// This channel could be closed at any time due to session invalidation,
// communication errors, operator intervention, etc. It is NOT safe to
// assume that the slot is held until Release() unless the Session is specifically
// created without any associated health checks. By default Consul sessions
// prefer liveness over safety and an application must be able to handle
// the session being lost.
func (s *Semaphore) Acquire(stopCh <-chan struct{}) (<-chan struct{}, error) {
// Hold the lock as we try to acquire
s.l.Lock()
defer s.l.Unlock()
// Check if we already hold the semaphore
if s.isHeld {
return nil, ErrSemaphoreHeld
}
// Check if we need to create a session first
s.lockSession = s.opts.Session
if s.lockSession == "" {
if sess, err := s.createSession(); err != nil {
return nil, fmt.Errorf("failed to create session: %v", err)
} else {
s.sessionRenew = make(chan struct{})
s.lockSession = sess
session := s.c.Session()
go session.RenewPeriodic(s.opts.SessionTTL, sess, nil, s.sessionRenew)
// If we fail to acquire the lock, cleanup the session
defer func() {
if !s.isHeld {
close(s.sessionRenew)
s.sessionRenew = nil
}
}()
}
}
// Create the contender entry
kv := s.c.KV()
made, _, err := kv.Acquire(s.contenderEntry(s.lockSession), nil)
if err != nil || !made {
return nil, fmt.Errorf("failed to make contender entry: %v", err)
}
// Setup the query options
qOpts := &QueryOptions{
WaitTime: s.opts.SemaphoreWaitTime,
}
start := time.Now()
attempts := 0
WAIT:
// Check if we should quit
select {
case <-stopCh:
return nil, nil
default:
}
// Handle the one-shot mode.
if s.opts.SemaphoreTryOnce && attempts > 0 {
elapsed := time.Now().Sub(start)
if elapsed > qOpts.WaitTime {
return nil, nil
}
qOpts.WaitTime -= elapsed
}
attempts++
// Read the prefix
pairs, meta, err := kv.List(s.opts.Prefix, qOpts)
if err != nil {
return nil, fmt.Errorf("failed to read prefix: %v", err)
}
// Decode the lock
lockPair := s.findLock(pairs)
if lockPair.Flags != SemaphoreFlagValue {
return nil, ErrSemaphoreConflict
}
lock, err := s.decodeLock(lockPair)
if err != nil {
return nil, err
}
// Verify we agree with the limit
if lock.Limit != s.opts.Limit {
return nil, fmt.Errorf("semaphore limit conflict (lock: %d, local: %d)",
lock.Limit, s.opts.Limit)
}
// Prune the dead holders
s.pruneDeadHolders(lock, pairs)
// Check if the lock is held
if len(lock.Holders) >= lock.Limit {
qOpts.WaitIndex = meta.LastIndex
goto WAIT
}
// Create a new lock with us as a holder
lock.Holders[s.lockSession] = true
newLock, err := s.encodeLock(lock, lockPair.ModifyIndex)
if err != nil {
return nil, err
}
// Attempt the acquisition
didSet, _, err := kv.CAS(newLock, nil)
if err != nil {
return nil, fmt.Errorf("failed to update lock: %v", err)
}
if !didSet {
// Update failed, could have been a race with another contender,
// retry the operation
goto WAIT
}
// Watch to ensure we maintain ownership of the slot
lockCh := make(chan struct{})
go s.monitorLock(s.lockSession, lockCh)
// Set that we own the lock
s.isHeld = true
// Acquired! All done
return lockCh, nil
}
// Release is used to voluntarily give up our semaphore slot. It is
// an error to call this if the semaphore has not been acquired.
func (s *Semaphore) Release() error {
// Hold the lock as we try to release
s.l.Lock()
defer s.l.Unlock()
// Ensure the lock is actually held
if !s.isHeld {
return ErrSemaphoreNotHeld
}
// Set that we no longer own the lock
s.isHeld = false
// Stop the session renew
if s.sessionRenew != nil {
defer func() {
close(s.sessionRenew)
s.sessionRenew = nil
}()
}
// Get and clear the lock session
lockSession := s.lockSession
s.lockSession = ""
// Remove ourselves as a lock holder
kv := s.c.KV()
key := path.Join(s.opts.Prefix, DefaultSemaphoreKey)
READ:
pair, _, err := kv.Get(key, nil)
if err != nil {
return err
}
if pair == nil {
pair = &KVPair{}
}
lock, err := s.decodeLock(pair)
if err != nil {
return err
}
// Create a new lock without us as a holder
if _, ok := lock.Holders[lockSession]; ok {
delete(lock.Holders, lockSession)
newLock, err := s.encodeLock(lock, pair.ModifyIndex)
if err != nil {
return err
}
// Swap the locks
didSet, _, err := kv.CAS(newLock, nil)
if err != nil {
return fmt.Errorf("failed to update lock: %v", err)
}
if !didSet {
goto READ
}
}
// Destroy the contender entry
contenderKey := path.Join(s.opts.Prefix, lockSession)
if _, err := kv.Delete(contenderKey, nil); err != nil {
return err
}
return nil
}
// Destroy is used to cleanup the semaphore entry. It is not necessary
// to invoke. It will fail if the semaphore is in use.
func (s *Semaphore) Destroy() error {
// Hold the lock as we try to acquire
s.l.Lock()
defer s.l.Unlock()
// Check if we already hold the semaphore
if s.isHeld {
return ErrSemaphoreHeld
}
// List for the semaphore
kv := s.c.KV()
pairs, _, err := kv.List(s.opts.Prefix, nil)
if err != nil {
return fmt.Errorf("failed to read prefix: %v", err)
}
// Find the lock pair, bail if it doesn't exist
lockPair := s.findLock(pairs)
if lockPair.ModifyIndex == 0 {
return nil
}
if lockPair.Flags != SemaphoreFlagValue {
return ErrSemaphoreConflict
}
// Decode the lock
lock, err := s.decodeLock(lockPair)
if err != nil {
return err
}
// Prune the dead holders
s.pruneDeadHolders(lock, pairs)
// Check if there are any holders
if len(lock.Holders) > 0 {
return ErrSemaphoreInUse
}
// Attempt the delete
didRemove, _, err := kv.DeleteCAS(lockPair, nil)
if err != nil {
return fmt.Errorf("failed to remove semaphore: %v", err)
}
if !didRemove {
return ErrSemaphoreInUse
}
return nil
}
// createSession is used to create a new managed session
func (s *Semaphore) createSession() (string, error) {
session := s.c.Session()
se := &SessionEntry{
Name: s.opts.SessionName,
TTL: s.opts.SessionTTL,
Behavior: SessionBehaviorDelete,
}
id, _, err := session.Create(se, nil)
if err != nil {
return "", err
}
return id, nil
}
// contenderEntry returns a formatted KVPair for the contender
func (s *Semaphore) contenderEntry(session string) *KVPair {
return &KVPair{
Key: path.Join(s.opts.Prefix, session),
Value: s.opts.Value,
Session: session,
Flags: SemaphoreFlagValue,
}
}
// findLock is used to find the KV Pair which is used for coordination
func (s *Semaphore) findLock(pairs KVPairs) *KVPair {
key := path.Join(s.opts.Prefix, DefaultSemaphoreKey)
for _, pair := range pairs {
if pair.Key == key {
return pair
}
}
return &KVPair{Flags: SemaphoreFlagValue}
}
// decodeLock is used to decode a semaphoreLock from an
// entry in Consul
func (s *Semaphore) decodeLock(pair *KVPair) (*semaphoreLock, error) {
// Handle if there is no lock
if pair == nil || pair.Value == nil {
return &semaphoreLock{
Limit: s.opts.Limit,
Holders: make(map[string]bool),
}, nil
}
l := &semaphoreLock{}
if err := json.Unmarshal(pair.Value, l); err != nil {
return nil, fmt.Errorf("lock decoding failed: %v", err)
}
return l, nil
}
// encodeLock is used to encode a semaphoreLock into a KVPair
// that can be PUT
func (s *Semaphore) encodeLock(l *semaphoreLock, oldIndex uint64) (*KVPair, error) {
enc, err := json.Marshal(l)
if err != nil {
return nil, fmt.Errorf("lock encoding failed: %v", err)
}
pair := &KVPair{
Key: path.Join(s.opts.Prefix, DefaultSemaphoreKey),
Value: enc,
Flags: SemaphoreFlagValue,
ModifyIndex: oldIndex,
}
return pair, nil
}
// pruneDeadHolders is used to remove all the dead lock holders
func (s *Semaphore) pruneDeadHolders(lock *semaphoreLock, pairs KVPairs) {
// Gather all the live holders
alive := make(map[string]struct{}, len(pairs))
for _, pair := range pairs {
if pair.Session != "" {
alive[pair.Session] = struct{}{}
}
}
// Remove any holders that are dead
for holder := range lock.Holders {
if _, ok := alive[holder]; !ok {
delete(lock.Holders, holder)
}
}
}
// monitorLock is a long running routine to monitor a semaphore ownership
// It closes the stopCh if we lose our slot.
func (s *Semaphore) monitorLock(session string, stopCh chan struct{}) {
defer close(stopCh)
kv := s.c.KV()
opts := &QueryOptions{RequireConsistent: true}
WAIT:
retries := s.opts.MonitorRetries
RETRY:
pairs, meta, err := kv.List(s.opts.Prefix, opts)
if err != nil {
// If configured we can try to ride out a brief Consul unavailability
// by doing retries. Note that we have to attempt the retry in a non-
// blocking fashion so that we have a clean place to reset the retry
// counter if service is restored.
if retries > 0 && IsServerError(err) {
time.Sleep(s.opts.MonitorRetryTime)
retries--
opts.WaitIndex = 0
goto RETRY
}
return
}
lockPair := s.findLock(pairs)
lock, err := s.decodeLock(lockPair)
if err != nil {
return
}
s.pruneDeadHolders(lock, pairs)
if _, ok := lock.Holders[session]; ok {
opts.WaitIndex = meta.LastIndex
goto WAIT
}
}

View file

@ -0,0 +1,217 @@
package api
import (
"errors"
"fmt"
"time"
)
const (
// SessionBehaviorRelease is the default behavior and causes
// all associated locks to be released on session invalidation.
SessionBehaviorRelease = "release"
// SessionBehaviorDelete is new in Consul 0.5 and changes the
// behavior to delete all associated locks on session invalidation.
// It can be used in a way similar to Ephemeral Nodes in ZooKeeper.
SessionBehaviorDelete = "delete"
)
var ErrSessionExpired = errors.New("session expired")
// SessionEntry represents a session in consul
type SessionEntry struct {
CreateIndex uint64
ID string
Name string
Node string
Checks []string
LockDelay time.Duration
Behavior string
TTL string
}
// Session can be used to query the Session endpoints
type Session struct {
c *Client
}
// Session returns a handle to the session endpoints
func (c *Client) Session() *Session {
return &Session{c}
}
// CreateNoChecks is like Create but is used specifically to create
// a session with no associated health checks.
func (s *Session) CreateNoChecks(se *SessionEntry, q *WriteOptions) (string, *WriteMeta, error) {
body := make(map[string]interface{})
body["Checks"] = []string{}
if se != nil {
if se.Name != "" {
body["Name"] = se.Name
}
if se.Node != "" {
body["Node"] = se.Node
}
if se.LockDelay != 0 {
body["LockDelay"] = durToMsec(se.LockDelay)
}
if se.Behavior != "" {
body["Behavior"] = se.Behavior
}
if se.TTL != "" {
body["TTL"] = se.TTL
}
}
return s.create(body, q)
}
// Create makes a new session. Providing a session entry can
// customize the session. It can also be nil to use defaults.
func (s *Session) Create(se *SessionEntry, q *WriteOptions) (string, *WriteMeta, error) {
var obj interface{}
if se != nil {
body := make(map[string]interface{})
obj = body
if se.Name != "" {
body["Name"] = se.Name
}
if se.Node != "" {
body["Node"] = se.Node
}
if se.LockDelay != 0 {
body["LockDelay"] = durToMsec(se.LockDelay)
}
if len(se.Checks) > 0 {
body["Checks"] = se.Checks
}
if se.Behavior != "" {
body["Behavior"] = se.Behavior
}
if se.TTL != "" {
body["TTL"] = se.TTL
}
}
return s.create(obj, q)
}
func (s *Session) create(obj interface{}, q *WriteOptions) (string, *WriteMeta, error) {
var out struct{ ID string }
wm, err := s.c.write("/v1/session/create", obj, &out, q)
if err != nil {
return "", nil, err
}
return out.ID, wm, nil
}
// Destroy invalidates a given session
func (s *Session) Destroy(id string, q *WriteOptions) (*WriteMeta, error) {
wm, err := s.c.write("/v1/session/destroy/"+id, nil, nil, q)
if err != nil {
return nil, err
}
return wm, nil
}
// Renew renews the TTL on a given session
func (s *Session) Renew(id string, q *WriteOptions) (*SessionEntry, *WriteMeta, error) {
r := s.c.newRequest("PUT", "/v1/session/renew/"+id)
r.setWriteOptions(q)
rtt, resp, err := s.c.doRequest(r)
if err != nil {
return nil, nil, err
}
defer resp.Body.Close()
wm := &WriteMeta{RequestTime: rtt}
if resp.StatusCode == 404 {
return nil, wm, nil
} else if resp.StatusCode != 200 {
return nil, nil, fmt.Errorf("Unexpected response code: %d", resp.StatusCode)
}
var entries []*SessionEntry
if err := decodeBody(resp, &entries); err != nil {
return nil, nil, fmt.Errorf("Failed to read response: %v", err)
}
if len(entries) > 0 {
return entries[0], wm, nil
}
return nil, wm, nil
}
// RenewPeriodic is used to periodically invoke Session.Renew on a
// session until a doneCh is closed. This is meant to be used in a long running
// goroutine to ensure a session stays valid.
func (s *Session) RenewPeriodic(initialTTL string, id string, q *WriteOptions, doneCh chan struct{}) error {
ttl, err := time.ParseDuration(initialTTL)
if err != nil {
return err
}
waitDur := ttl / 2
lastRenewTime := time.Now()
var lastErr error
for {
if time.Since(lastRenewTime) > ttl {
return lastErr
}
select {
case <-time.After(waitDur):
entry, _, err := s.Renew(id, q)
if err != nil {
waitDur = time.Second
lastErr = err
continue
}
if entry == nil {
return ErrSessionExpired
}
// Handle the server updating the TTL
ttl, _ = time.ParseDuration(entry.TTL)
waitDur = ttl / 2
lastRenewTime = time.Now()
case <-doneCh:
// Attempt a session destroy
s.Destroy(id, q)
return nil
}
}
}
// Info looks up a single session
func (s *Session) Info(id string, q *QueryOptions) (*SessionEntry, *QueryMeta, error) {
var entries []*SessionEntry
qm, err := s.c.query("/v1/session/info/"+id, &entries, q)
if err != nil {
return nil, nil, err
}
if len(entries) > 0 {
return entries[0], qm, nil
}
return nil, qm, nil
}
// List gets sessions for a node
func (s *Session) Node(node string, q *QueryOptions) ([]*SessionEntry, *QueryMeta, error) {
var entries []*SessionEntry
qm, err := s.c.query("/v1/session/node/"+node, &entries, q)
if err != nil {
return nil, nil, err
}
return entries, qm, nil
}
// List gets all active sessions
func (s *Session) List(q *QueryOptions) ([]*SessionEntry, *QueryMeta, error) {
var entries []*SessionEntry
qm, err := s.c.query("/v1/session/list", &entries, q)
if err != nil {
return nil, nil, err
}
return entries, qm, nil
}

View file

@ -0,0 +1,43 @@
package api
// Status can be used to query the Status endpoints
type Status struct {
c *Client
}
// Status returns a handle to the status endpoints
func (c *Client) Status() *Status {
return &Status{c}
}
// Leader is used to query for a known leader
func (s *Status) Leader() (string, error) {
r := s.c.newRequest("GET", "/v1/status/leader")
_, resp, err := requireOK(s.c.doRequest(r))
if err != nil {
return "", err
}
defer resp.Body.Close()
var leader string
if err := decodeBody(resp, &leader); err != nil {
return "", err
}
return leader, nil
}
// Peers is used to query for a known raft peers
func (s *Status) Peers() ([]string, error) {
r := s.c.newRequest("GET", "/v1/status/peers")
_, resp, err := requireOK(s.c.doRequest(r))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var peers []string
if err := decodeBody(resp, &peers); err != nil {
return nil, err
}
return peers, nil
}

View file

@ -0,0 +1,186 @@
package main
import (
"os"
"os/signal"
"syscall"
"github.com/hashicorp/consul/command"
"github.com/hashicorp/consul/command/agent"
"github.com/mitchellh/cli"
)
// Commands is the mapping of all the available Consul commands.
var Commands map[string]cli.CommandFactory
func init() {
ui := &cli.BasicUi{Writer: os.Stdout}
Commands = map[string]cli.CommandFactory{
"agent": func() (cli.Command, error) {
return &agent.Command{
Revision: GitCommit,
Version: Version,
VersionPrerelease: VersionPrerelease,
HumanVersion: GetHumanVersion(),
Ui: ui,
ShutdownCh: make(chan struct{}),
}, nil
},
"configtest": func() (cli.Command, error) {
return &command.ConfigTestCommand{
Ui: ui,
}, nil
},
"event": func() (cli.Command, error) {
return &command.EventCommand{
Ui: ui,
}, nil
},
"exec": func() (cli.Command, error) {
return &command.ExecCommand{
ShutdownCh: makeShutdownCh(),
Ui: ui,
}, nil
},
"force-leave": func() (cli.Command, error) {
return &command.ForceLeaveCommand{
Ui: ui,
}, nil
},
"kv": func() (cli.Command, error) {
return &command.KVCommand{
Ui: ui,
}, nil
},
"kv delete": func() (cli.Command, error) {
return &command.KVDeleteCommand{
Ui: ui,
}, nil
},
"kv get": func() (cli.Command, error) {
return &command.KVGetCommand{
Ui: ui,
}, nil
},
"kv put": func() (cli.Command, error) {
return &command.KVPutCommand{
Ui: ui,
}, nil
},
"join": func() (cli.Command, error) {
return &command.JoinCommand{
Ui: ui,
}, nil
},
"keygen": func() (cli.Command, error) {
return &command.KeygenCommand{
Ui: ui,
}, nil
},
"keyring": func() (cli.Command, error) {
return &command.KeyringCommand{
Ui: ui,
}, nil
},
"leave": func() (cli.Command, error) {
return &command.LeaveCommand{
Ui: ui,
}, nil
},
"lock": func() (cli.Command, error) {
return &command.LockCommand{
ShutdownCh: makeShutdownCh(),
Ui: ui,
}, nil
},
"maint": func() (cli.Command, error) {
return &command.MaintCommand{
Ui: ui,
}, nil
},
"members": func() (cli.Command, error) {
return &command.MembersCommand{
Ui: ui,
}, nil
},
"monitor": func() (cli.Command, error) {
return &command.MonitorCommand{
ShutdownCh: makeShutdownCh(),
Ui: ui,
}, nil
},
"operator": func() (cli.Command, error) {
return &command.OperatorCommand{
Ui: ui,
}, nil
},
"info": func() (cli.Command, error) {
return &command.InfoCommand{
Ui: ui,
}, nil
},
"reload": func() (cli.Command, error) {
return &command.ReloadCommand{
Ui: ui,
}, nil
},
"rtt": func() (cli.Command, error) {
return &command.RTTCommand{
Ui: ui,
}, nil
},
"version": func() (cli.Command, error) {
return &command.VersionCommand{
HumanVersion: GetHumanVersion(),
Ui: ui,
}, nil
},
"watch": func() (cli.Command, error) {
return &command.WatchCommand{
ShutdownCh: makeShutdownCh(),
Ui: ui,
}, nil
},
}
}
// makeShutdownCh returns a channel that can be used for shutdown
// notifications for commands. This channel will send a message for every
// interrupt or SIGTERM received.
func makeShutdownCh() <-chan struct{} {
resultCh := make(chan struct{})
signalCh := make(chan os.Signal, 4)
signal.Notify(signalCh, os.Interrupt, syscall.SIGTERM)
go func() {
for {
<-signalCh
resultCh <- struct{}{}
}
}()
return resultCh
}

53
integration/vendor/github.com/hashicorp/consul/main.go generated vendored Normal file
View file

@ -0,0 +1,53 @@
package main
import (
"fmt"
"github.com/mitchellh/cli"
"io/ioutil"
"log"
"os"
"github.com/hashicorp/consul/lib"
)
func init() {
lib.SeedMathRand()
}
func main() {
os.Exit(realMain())
}
func realMain() int {
log.SetOutput(ioutil.Discard)
// Get the command line args. We shortcut "--version" and "-v" to
// just show the version.
args := os.Args[1:]
for _, arg := range args {
if arg == "--" {
break
}
if arg == "-v" || arg == "--version" {
newArgs := make([]string, len(args)+1)
newArgs[0] = "version"
copy(newArgs[1:], args)
args = newArgs
break
}
}
cli := &cli.CLI{
Args: args,
Commands: Commands,
HelpFunc: cli.BasicHelpFunc("consul"),
}
exitCode, err := cli.Run()
if err != nil {
fmt.Fprintf(os.Stderr, "Error executing CLI: %s\n", err.Error())
return 1
}
return exitCode
}

View file

@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2014 Simon Eskildsen
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View file

@ -0,0 +1,64 @@
package logrus
// The following code was sourced and modified from the
// https://bitbucket.org/tebeka/atexit package governed by the following license:
//
// Copyright (c) 2012 Miki Tebeka <miki.tebeka@gmail.com>.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
// the Software, and to permit persons to whom the Software is furnished to do so,
// subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
// FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
// COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
// IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
// CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
import (
"fmt"
"os"
)
var handlers = []func(){}
func runHandler(handler func()) {
defer func() {
if err := recover(); err != nil {
fmt.Fprintln(os.Stderr, "Error: Logrus exit handler error:", err)
}
}()
handler()
}
func runHandlers() {
for _, handler := range handlers {
runHandler(handler)
}
}
// Exit runs all the Logrus atexit handlers and then terminates the program using os.Exit(code)
func Exit(code int) {
runHandlers()
os.Exit(code)
}
// RegisterExitHandler adds a Logrus Exit handler, call logrus.Exit to invoke
// all handlers. The handlers will also be invoked when any Fatal log entry is
// made.
//
// This method is useful when a caller wishes to use logrus to log a fatal
// message but also needs to gracefully shutdown. An example usecase could be
// closing database connections, or sending a alert that the application is
// closing.
func RegisterExitHandler(handler func()) {
handlers = append(handlers, handler)
}

View file

@ -0,0 +1,26 @@
/*
Package logrus is a structured logger for Go, completely API compatible with the standard library logger.
The simplest way to use Logrus is simply the package-level exported logger:
package main
import (
log "github.com/Sirupsen/logrus"
)
func main() {
log.WithFields(log.Fields{
"animal": "walrus",
"number": 1,
"size": 10,
}).Info("A walrus appears")
}
Output:
time="2015-09-07T08:48:33Z" level=info msg="A walrus appears" animal=walrus number=1 size=10
For a full guide visit https://github.com/Sirupsen/logrus
*/
package logrus

View file

@ -0,0 +1,264 @@
package logrus
import (
"bytes"
"fmt"
"io"
"os"
"time"
)
// Defines the key when adding errors using WithError.
var ErrorKey = "error"
// An entry is the final or intermediate Logrus logging entry. It contains all
// the fields passed with WithField{,s}. It's finally logged when Debug, Info,
// Warn, Error, Fatal or Panic is called on it. These objects can be reused and
// passed around as much as you wish to avoid field duplication.
type Entry struct {
Logger *Logger
// Contains all the fields set by the user.
Data Fields
// Time at which the log entry was created
Time time.Time
// Level the log entry was logged at: Debug, Info, Warn, Error, Fatal or Panic
Level Level
// Message passed to Debug, Info, Warn, Error, Fatal or Panic
Message string
}
func NewEntry(logger *Logger) *Entry {
return &Entry{
Logger: logger,
// Default is three fields, give a little extra room
Data: make(Fields, 5),
}
}
// Returns a reader for the entry, which is a proxy to the formatter.
func (entry *Entry) Reader() (*bytes.Buffer, error) {
serialized, err := entry.Logger.Formatter.Format(entry)
return bytes.NewBuffer(serialized), err
}
// Returns the string representation from the reader and ultimately the
// formatter.
func (entry *Entry) String() (string, error) {
reader, err := entry.Reader()
if err != nil {
return "", err
}
return reader.String(), err
}
// Add an error as single field (using the key defined in ErrorKey) to the Entry.
func (entry *Entry) WithError(err error) *Entry {
return entry.WithField(ErrorKey, err)
}
// Add a single field to the Entry.
func (entry *Entry) WithField(key string, value interface{}) *Entry {
return entry.WithFields(Fields{key: value})
}
// Add a map of fields to the Entry.
func (entry *Entry) WithFields(fields Fields) *Entry {
data := make(Fields, len(entry.Data)+len(fields))
for k, v := range entry.Data {
data[k] = v
}
for k, v := range fields {
data[k] = v
}
return &Entry{Logger: entry.Logger, Data: data}
}
// This function is not declared with a pointer value because otherwise
// race conditions will occur when using multiple goroutines
func (entry Entry) log(level Level, msg string) {
entry.Time = time.Now()
entry.Level = level
entry.Message = msg
if err := entry.Logger.Hooks.Fire(level, &entry); err != nil {
entry.Logger.mu.Lock()
fmt.Fprintf(os.Stderr, "Failed to fire hook: %v\n", err)
entry.Logger.mu.Unlock()
}
reader, err := entry.Reader()
if err != nil {
entry.Logger.mu.Lock()
fmt.Fprintf(os.Stderr, "Failed to obtain reader, %v\n", err)
entry.Logger.mu.Unlock()
}
entry.Logger.mu.Lock()
defer entry.Logger.mu.Unlock()
_, err = io.Copy(entry.Logger.Out, reader)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to write to log, %v\n", err)
}
// To avoid Entry#log() returning a value that only would make sense for
// panic() to use in Entry#Panic(), we avoid the allocation by checking
// directly here.
if level <= PanicLevel {
panic(&entry)
}
}
func (entry *Entry) Debug(args ...interface{}) {
if entry.Logger.Level >= DebugLevel {
entry.log(DebugLevel, fmt.Sprint(args...))
}
}
func (entry *Entry) Print(args ...interface{}) {
entry.Info(args...)
}
func (entry *Entry) Info(args ...interface{}) {
if entry.Logger.Level >= InfoLevel {
entry.log(InfoLevel, fmt.Sprint(args...))
}
}
func (entry *Entry) Warn(args ...interface{}) {
if entry.Logger.Level >= WarnLevel {
entry.log(WarnLevel, fmt.Sprint(args...))
}
}
func (entry *Entry) Warning(args ...interface{}) {
entry.Warn(args...)
}
func (entry *Entry) Error(args ...interface{}) {
if entry.Logger.Level >= ErrorLevel {
entry.log(ErrorLevel, fmt.Sprint(args...))
}
}
func (entry *Entry) Fatal(args ...interface{}) {
if entry.Logger.Level >= FatalLevel {
entry.log(FatalLevel, fmt.Sprint(args...))
}
Exit(1)
}
func (entry *Entry) Panic(args ...interface{}) {
if entry.Logger.Level >= PanicLevel {
entry.log(PanicLevel, fmt.Sprint(args...))
}
panic(fmt.Sprint(args...))
}
// Entry Printf family functions
func (entry *Entry) Debugf(format string, args ...interface{}) {
if entry.Logger.Level >= DebugLevel {
entry.Debug(fmt.Sprintf(format, args...))
}
}
func (entry *Entry) Infof(format string, args ...interface{}) {
if entry.Logger.Level >= InfoLevel {
entry.Info(fmt.Sprintf(format, args...))
}
}
func (entry *Entry) Printf(format string, args ...interface{}) {
entry.Infof(format, args...)
}
func (entry *Entry) Warnf(format string, args ...interface{}) {
if entry.Logger.Level >= WarnLevel {
entry.Warn(fmt.Sprintf(format, args...))
}
}
func (entry *Entry) Warningf(format string, args ...interface{}) {
entry.Warnf(format, args...)
}
func (entry *Entry) Errorf(format string, args ...interface{}) {
if entry.Logger.Level >= ErrorLevel {
entry.Error(fmt.Sprintf(format, args...))
}
}
func (entry *Entry) Fatalf(format string, args ...interface{}) {
if entry.Logger.Level >= FatalLevel {
entry.Fatal(fmt.Sprintf(format, args...))
}
Exit(1)
}
func (entry *Entry) Panicf(format string, args ...interface{}) {
if entry.Logger.Level >= PanicLevel {
entry.Panic(fmt.Sprintf(format, args...))
}
}
// Entry Println family functions
func (entry *Entry) Debugln(args ...interface{}) {
if entry.Logger.Level >= DebugLevel {
entry.Debug(entry.sprintlnn(args...))
}
}
func (entry *Entry) Infoln(args ...interface{}) {
if entry.Logger.Level >= InfoLevel {
entry.Info(entry.sprintlnn(args...))
}
}
func (entry *Entry) Println(args ...interface{}) {
entry.Infoln(args...)
}
func (entry *Entry) Warnln(args ...interface{}) {
if entry.Logger.Level >= WarnLevel {
entry.Warn(entry.sprintlnn(args...))
}
}
func (entry *Entry) Warningln(args ...interface{}) {
entry.Warnln(args...)
}
func (entry *Entry) Errorln(args ...interface{}) {
if entry.Logger.Level >= ErrorLevel {
entry.Error(entry.sprintlnn(args...))
}
}
func (entry *Entry) Fatalln(args ...interface{}) {
if entry.Logger.Level >= FatalLevel {
entry.Fatal(entry.sprintlnn(args...))
}
Exit(1)
}
func (entry *Entry) Panicln(args ...interface{}) {
if entry.Logger.Level >= PanicLevel {
entry.Panic(entry.sprintlnn(args...))
}
}
// Sprintlnn => Sprint no newline. This is to get the behavior of how
// fmt.Sprintln where spaces are always added between operands, regardless of
// their type. Instead of vendoring the Sprintln implementation to spare a
// string allocation, we do the simplest thing.
func (entry *Entry) sprintlnn(args ...interface{}) string {
msg := fmt.Sprintln(args...)
return msg[:len(msg)-1]
}

View file

@ -0,0 +1,193 @@
package logrus
import (
"io"
)
var (
// std is the name of the standard logger in stdlib `log`
std = New()
)
func StandardLogger() *Logger {
return std
}
// SetOutput sets the standard logger output.
func SetOutput(out io.Writer) {
std.mu.Lock()
defer std.mu.Unlock()
std.Out = out
}
// SetFormatter sets the standard logger formatter.
func SetFormatter(formatter Formatter) {
std.mu.Lock()
defer std.mu.Unlock()
std.Formatter = formatter
}
// SetLevel sets the standard logger level.
func SetLevel(level Level) {
std.mu.Lock()
defer std.mu.Unlock()
std.Level = level
}
// GetLevel returns the standard logger level.
func GetLevel() Level {
std.mu.Lock()
defer std.mu.Unlock()
return std.Level
}
// AddHook adds a hook to the standard logger hooks.
func AddHook(hook Hook) {
std.mu.Lock()
defer std.mu.Unlock()
std.Hooks.Add(hook)
}
// WithError creates an entry from the standard logger and adds an error to it, using the value defined in ErrorKey as key.
func WithError(err error) *Entry {
return std.WithField(ErrorKey, err)
}
// WithField creates an entry from the standard logger and adds a field to
// it. If you want multiple fields, use `WithFields`.
//
// Note that it doesn't log until you call Debug, Print, Info, Warn, Fatal
// or Panic on the Entry it returns.
func WithField(key string, value interface{}) *Entry {
return std.WithField(key, value)
}
// WithFields creates an entry from the standard logger and adds multiple
// fields to it. This is simply a helper for `WithField`, invoking it
// once for each field.
//
// Note that it doesn't log until you call Debug, Print, Info, Warn, Fatal
// or Panic on the Entry it returns.
func WithFields(fields Fields) *Entry {
return std.WithFields(fields)
}
// Debug logs a message at level Debug on the standard logger.
func Debug(args ...interface{}) {
std.Debug(args...)
}
// Print logs a message at level Info on the standard logger.
func Print(args ...interface{}) {
std.Print(args...)
}
// Info logs a message at level Info on the standard logger.
func Info(args ...interface{}) {
std.Info(args...)
}
// Warn logs a message at level Warn on the standard logger.
func Warn(args ...interface{}) {
std.Warn(args...)
}
// Warning logs a message at level Warn on the standard logger.
func Warning(args ...interface{}) {
std.Warning(args...)
}
// Error logs a message at level Error on the standard logger.
func Error(args ...interface{}) {
std.Error(args...)
}
// Panic logs a message at level Panic on the standard logger.
func Panic(args ...interface{}) {
std.Panic(args...)
}
// Fatal logs a message at level Fatal on the standard logger.
func Fatal(args ...interface{}) {
std.Fatal(args...)
}
// Debugf logs a message at level Debug on the standard logger.
func Debugf(format string, args ...interface{}) {
std.Debugf(format, args...)
}
// Printf logs a message at level Info on the standard logger.
func Printf(format string, args ...interface{}) {
std.Printf(format, args...)
}
// Infof logs a message at level Info on the standard logger.
func Infof(format string, args ...interface{}) {
std.Infof(format, args...)
}
// Warnf logs a message at level Warn on the standard logger.
func Warnf(format string, args ...interface{}) {
std.Warnf(format, args...)
}
// Warningf logs a message at level Warn on the standard logger.
func Warningf(format string, args ...interface{}) {
std.Warningf(format, args...)
}
// Errorf logs a message at level Error on the standard logger.
func Errorf(format string, args ...interface{}) {
std.Errorf(format, args...)
}
// Panicf logs a message at level Panic on the standard logger.
func Panicf(format string, args ...interface{}) {
std.Panicf(format, args...)
}
// Fatalf logs a message at level Fatal on the standard logger.
func Fatalf(format string, args ...interface{}) {
std.Fatalf(format, args...)
}
// Debugln logs a message at level Debug on the standard logger.
func Debugln(args ...interface{}) {
std.Debugln(args...)
}
// Println logs a message at level Info on the standard logger.
func Println(args ...interface{}) {
std.Println(args...)
}
// Infoln logs a message at level Info on the standard logger.
func Infoln(args ...interface{}) {
std.Infoln(args...)
}
// Warnln logs a message at level Warn on the standard logger.
func Warnln(args ...interface{}) {
std.Warnln(args...)
}
// Warningln logs a message at level Warn on the standard logger.
func Warningln(args ...interface{}) {
std.Warningln(args...)
}
// Errorln logs a message at level Error on the standard logger.
func Errorln(args ...interface{}) {
std.Errorln(args...)
}
// Panicln logs a message at level Panic on the standard logger.
func Panicln(args ...interface{}) {
std.Panicln(args...)
}
// Fatalln logs a message at level Fatal on the standard logger.
func Fatalln(args ...interface{}) {
std.Fatalln(args...)
}

View file

@ -0,0 +1,45 @@
package logrus
import "time"
const DefaultTimestampFormat = time.RFC3339
// The Formatter interface is used to implement a custom Formatter. It takes an
// `Entry`. It exposes all the fields, including the default ones:
//
// * `entry.Data["msg"]`. The message passed from Info, Warn, Error ..
// * `entry.Data["time"]`. The timestamp.
// * `entry.Data["level"]. The level the entry was logged at.
//
// Any additional fields added with `WithField` or `WithFields` are also in
// `entry.Data`. Format is expected to return an array of bytes which are then
// logged to `logger.Out`.
type Formatter interface {
Format(*Entry) ([]byte, error)
}
// This is to not silently overwrite `time`, `msg` and `level` fields when
// dumping it. If this code wasn't there doing:
//
// logrus.WithField("level", 1).Info("hello")
//
// Would just silently drop the user provided level. Instead with this code
// it'll logged as:
//
// {"level": "info", "fields.level": 1, "msg": "hello", "time": "..."}
//
// It's not exported because it's still using Data in an opinionated way. It's to
// avoid code duplication between the two default formatters.
func prefixFieldClashes(data Fields) {
if t, ok := data["time"]; ok {
data["fields.time"] = t
}
if m, ok := data["msg"]; ok {
data["fields.msg"] = m
}
if l, ok := data["level"]; ok {
data["fields.level"] = l
}
}

View file

@ -0,0 +1,34 @@
package logrus
// A hook to be fired when logging on the logging levels returned from
// `Levels()` on your implementation of the interface. Note that this is not
// fired in a goroutine or a channel with workers, you should handle such
// functionality yourself if your call is non-blocking and you don't wish for
// the logging calls for levels returned from `Levels()` to block.
type Hook interface {
Levels() []Level
Fire(*Entry) error
}
// Internal type for storing the hooks on a logger instance.
type LevelHooks map[Level][]Hook
// Add a hook to an instance of logger. This is called with
// `log.Hooks.Add(new(MyHook))` where `MyHook` implements the `Hook` interface.
func (hooks LevelHooks) Add(hook Hook) {
for _, level := range hook.Levels() {
hooks[level] = append(hooks[level], hook)
}
}
// Fire all the hooks for the passed level. Used by `entry.log` to fire
// appropriate hooks for a log entry.
func (hooks LevelHooks) Fire(level Level, entry *Entry) error {
for _, hook := range hooks[level] {
if err := hook.Fire(entry); err != nil {
return err
}
}
return nil
}

View file

@ -0,0 +1,41 @@
package logrus
import (
"encoding/json"
"fmt"
)
type JSONFormatter struct {
// TimestampFormat sets the format used for marshaling timestamps.
TimestampFormat string
}
func (f *JSONFormatter) Format(entry *Entry) ([]byte, error) {
data := make(Fields, len(entry.Data)+3)
for k, v := range entry.Data {
switch v := v.(type) {
case error:
// Otherwise errors are ignored by `encoding/json`
// https://github.com/Sirupsen/logrus/issues/137
data[k] = v.Error()
default:
data[k] = v
}
}
prefixFieldClashes(data)
timestampFormat := f.TimestampFormat
if timestampFormat == "" {
timestampFormat = DefaultTimestampFormat
}
data["time"] = entry.Time.Format(timestampFormat)
data["msg"] = entry.Message
data["level"] = entry.Level.String()
serialized, err := json.Marshal(data)
if err != nil {
return nil, fmt.Errorf("Failed to marshal fields to JSON, %v", err)
}
return append(serialized, '\n'), nil
}

View file

@ -0,0 +1,212 @@
package logrus
import (
"io"
"os"
"sync"
)
type Logger struct {
// The logs are `io.Copy`'d to this in a mutex. It's common to set this to a
// file, or leave it default which is `os.Stderr`. You can also set this to
// something more adventorous, such as logging to Kafka.
Out io.Writer
// Hooks for the logger instance. These allow firing events based on logging
// levels and log entries. For example, to send errors to an error tracking
// service, log to StatsD or dump the core on fatal errors.
Hooks LevelHooks
// All log entries pass through the formatter before logged to Out. The
// included formatters are `TextFormatter` and `JSONFormatter` for which
// TextFormatter is the default. In development (when a TTY is attached) it
// logs with colors, but to a file it wouldn't. You can easily implement your
// own that implements the `Formatter` interface, see the `README` or included
// formatters for examples.
Formatter Formatter
// The logging level the logger should log at. This is typically (and defaults
// to) `logrus.Info`, which allows Info(), Warn(), Error() and Fatal() to be
// logged. `logrus.Debug` is useful in
Level Level
// Used to sync writing to the log.
mu sync.Mutex
}
// Creates a new logger. Configuration should be set by changing `Formatter`,
// `Out` and `Hooks` directly on the default logger instance. You can also just
// instantiate your own:
//
// var log = &Logger{
// Out: os.Stderr,
// Formatter: new(JSONFormatter),
// Hooks: make(LevelHooks),
// Level: logrus.DebugLevel,
// }
//
// It's recommended to make this a global instance called `log`.
func New() *Logger {
return &Logger{
Out: os.Stderr,
Formatter: new(TextFormatter),
Hooks: make(LevelHooks),
Level: InfoLevel,
}
}
// Adds a field to the log entry, note that it doesn't log until you call
// Debug, Print, Info, Warn, Fatal or Panic. It only creates a log entry.
// If you want multiple fields, use `WithFields`.
func (logger *Logger) WithField(key string, value interface{}) *Entry {
return NewEntry(logger).WithField(key, value)
}
// Adds a struct of fields to the log entry. All it does is call `WithField` for
// each `Field`.
func (logger *Logger) WithFields(fields Fields) *Entry {
return NewEntry(logger).WithFields(fields)
}
// Add an error as single field to the log entry. All it does is call
// `WithError` for the given `error`.
func (logger *Logger) WithError(err error) *Entry {
return NewEntry(logger).WithError(err)
}
func (logger *Logger) Debugf(format string, args ...interface{}) {
if logger.Level >= DebugLevel {
NewEntry(logger).Debugf(format, args...)
}
}
func (logger *Logger) Infof(format string, args ...interface{}) {
if logger.Level >= InfoLevel {
NewEntry(logger).Infof(format, args...)
}
}
func (logger *Logger) Printf(format string, args ...interface{}) {
NewEntry(logger).Printf(format, args...)
}
func (logger *Logger) Warnf(format string, args ...interface{}) {
if logger.Level >= WarnLevel {
NewEntry(logger).Warnf(format, args...)
}
}
func (logger *Logger) Warningf(format string, args ...interface{}) {
if logger.Level >= WarnLevel {
NewEntry(logger).Warnf(format, args...)
}
}
func (logger *Logger) Errorf(format string, args ...interface{}) {
if logger.Level >= ErrorLevel {
NewEntry(logger).Errorf(format, args...)
}
}
func (logger *Logger) Fatalf(format string, args ...interface{}) {
if logger.Level >= FatalLevel {
NewEntry(logger).Fatalf(format, args...)
}
Exit(1)
}
func (logger *Logger) Panicf(format string, args ...interface{}) {
if logger.Level >= PanicLevel {
NewEntry(logger).Panicf(format, args...)
}
}
func (logger *Logger) Debug(args ...interface{}) {
if logger.Level >= DebugLevel {
NewEntry(logger).Debug(args...)
}
}
func (logger *Logger) Info(args ...interface{}) {
if logger.Level >= InfoLevel {
NewEntry(logger).Info(args...)
}
}
func (logger *Logger) Print(args ...interface{}) {
NewEntry(logger).Info(args...)
}
func (logger *Logger) Warn(args ...interface{}) {
if logger.Level >= WarnLevel {
NewEntry(logger).Warn(args...)
}
}
func (logger *Logger) Warning(args ...interface{}) {
if logger.Level >= WarnLevel {
NewEntry(logger).Warn(args...)
}
}
func (logger *Logger) Error(args ...interface{}) {
if logger.Level >= ErrorLevel {
NewEntry(logger).Error(args...)
}
}
func (logger *Logger) Fatal(args ...interface{}) {
if logger.Level >= FatalLevel {
NewEntry(logger).Fatal(args...)
}
Exit(1)
}
func (logger *Logger) Panic(args ...interface{}) {
if logger.Level >= PanicLevel {
NewEntry(logger).Panic(args...)
}
}
func (logger *Logger) Debugln(args ...interface{}) {
if logger.Level >= DebugLevel {
NewEntry(logger).Debugln(args...)
}
}
func (logger *Logger) Infoln(args ...interface{}) {
if logger.Level >= InfoLevel {
NewEntry(logger).Infoln(args...)
}
}
func (logger *Logger) Println(args ...interface{}) {
NewEntry(logger).Println(args...)
}
func (logger *Logger) Warnln(args ...interface{}) {
if logger.Level >= WarnLevel {
NewEntry(logger).Warnln(args...)
}
}
func (logger *Logger) Warningln(args ...interface{}) {
if logger.Level >= WarnLevel {
NewEntry(logger).Warnln(args...)
}
}
func (logger *Logger) Errorln(args ...interface{}) {
if logger.Level >= ErrorLevel {
NewEntry(logger).Errorln(args...)
}
}
func (logger *Logger) Fatalln(args ...interface{}) {
if logger.Level >= FatalLevel {
NewEntry(logger).Fatalln(args...)
}
Exit(1)
}
func (logger *Logger) Panicln(args ...interface{}) {
if logger.Level >= PanicLevel {
NewEntry(logger).Panicln(args...)
}
}

View file

@ -0,0 +1,143 @@
package logrus
import (
"fmt"
"log"
"strings"
)
// Fields type, used to pass to `WithFields`.
type Fields map[string]interface{}
// Level type
type Level uint8
// Convert the Level to a string. E.g. PanicLevel becomes "panic".
func (level Level) String() string {
switch level {
case DebugLevel:
return "debug"
case InfoLevel:
return "info"
case WarnLevel:
return "warning"
case ErrorLevel:
return "error"
case FatalLevel:
return "fatal"
case PanicLevel:
return "panic"
}
return "unknown"
}
// ParseLevel takes a string level and returns the Logrus log level constant.
func ParseLevel(lvl string) (Level, error) {
switch strings.ToLower(lvl) {
case "panic":
return PanicLevel, nil
case "fatal":
return FatalLevel, nil
case "error":
return ErrorLevel, nil
case "warn", "warning":
return WarnLevel, nil
case "info":
return InfoLevel, nil
case "debug":
return DebugLevel, nil
}
var l Level
return l, fmt.Errorf("not a valid logrus Level: %q", lvl)
}
// A constant exposing all logging levels
var AllLevels = []Level{
PanicLevel,
FatalLevel,
ErrorLevel,
WarnLevel,
InfoLevel,
DebugLevel,
}
// These are the different logging levels. You can set the logging level to log
// on your instance of logger, obtained with `logrus.New()`.
const (
// PanicLevel level, highest level of severity. Logs and then calls panic with the
// message passed to Debug, Info, ...
PanicLevel Level = iota
// FatalLevel level. Logs and then calls `os.Exit(1)`. It will exit even if the
// logging level is set to Panic.
FatalLevel
// ErrorLevel level. Logs. Used for errors that should definitely be noted.
// Commonly used for hooks to send errors to an error tracking service.
ErrorLevel
// WarnLevel level. Non-critical entries that deserve eyes.
WarnLevel
// InfoLevel level. General operational entries about what's going on inside the
// application.
InfoLevel
// DebugLevel level. Usually only enabled when debugging. Very verbose logging.
DebugLevel
)
// Won't compile if StdLogger can't be realized by a log.Logger
var (
_ StdLogger = &log.Logger{}
_ StdLogger = &Entry{}
_ StdLogger = &Logger{}
)
// StdLogger is what your logrus-enabled library should take, that way
// it'll accept a stdlib logger and a logrus logger. There's no standard
// interface, this is the closest we get, unfortunately.
type StdLogger interface {
Print(...interface{})
Printf(string, ...interface{})
Println(...interface{})
Fatal(...interface{})
Fatalf(string, ...interface{})
Fatalln(...interface{})
Panic(...interface{})
Panicf(string, ...interface{})
Panicln(...interface{})
}
// The FieldLogger interface generalizes the Entry and Logger types
type FieldLogger interface {
WithField(key string, value interface{}) *Entry
WithFields(fields Fields) *Entry
WithError(err error) *Entry
Debugf(format string, args ...interface{})
Infof(format string, args ...interface{})
Printf(format string, args ...interface{})
Warnf(format string, args ...interface{})
Warningf(format string, args ...interface{})
Errorf(format string, args ...interface{})
Fatalf(format string, args ...interface{})
Panicf(format string, args ...interface{})
Debug(args ...interface{})
Info(args ...interface{})
Print(args ...interface{})
Warn(args ...interface{})
Warning(args ...interface{})
Error(args ...interface{})
Fatal(args ...interface{})
Panic(args ...interface{})
Debugln(args ...interface{})
Infoln(args ...interface{})
Println(args ...interface{})
Warnln(args ...interface{})
Warningln(args ...interface{})
Errorln(args ...interface{})
Fatalln(args ...interface{})
Panicln(args ...interface{})
}

View file

@ -0,0 +1,9 @@
// +build darwin freebsd openbsd netbsd dragonfly
package logrus
import "syscall"
const ioctlReadTermios = syscall.TIOCGETA
type Termios syscall.Termios

View file

@ -0,0 +1,12 @@
// Based on ssh/terminal:
// Copyright 2013 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package logrus
import "syscall"
const ioctlReadTermios = syscall.TCGETS
type Termios syscall.Termios

View file

@ -0,0 +1,21 @@
// Based on ssh/terminal:
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build linux darwin freebsd openbsd netbsd dragonfly
package logrus
import (
"syscall"
"unsafe"
)
// IsTerminal returns true if stderr's file descriptor is a terminal.
func IsTerminal() bool {
fd := syscall.Stderr
var termios Termios
_, _, err := syscall.Syscall6(syscall.SYS_IOCTL, uintptr(fd), ioctlReadTermios, uintptr(unsafe.Pointer(&termios)), 0, 0, 0)
return err == 0
}

View file

@ -0,0 +1,15 @@
// +build solaris
package logrus
import (
"os"
"golang.org/x/sys/unix"
)
// IsTerminal returns true if the given file descriptor is a terminal.
func IsTerminal() bool {
_, err := unix.IoctlGetTermios(int(os.Stdout.Fd()), unix.TCGETA)
return err == nil
}

View file

@ -0,0 +1,27 @@
// Based on ssh/terminal:
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build windows
package logrus
import (
"syscall"
"unsafe"
)
var kernel32 = syscall.NewLazyDLL("kernel32.dll")
var (
procGetConsoleMode = kernel32.NewProc("GetConsoleMode")
)
// IsTerminal returns true if stderr's file descriptor is a terminal.
func IsTerminal() bool {
fd := syscall.Stderr
var st uint32
r, _, e := syscall.Syscall(procGetConsoleMode.Addr(), 2, uintptr(fd), uintptr(unsafe.Pointer(&st)), 0)
return r != 0 && e == 0
}

View file

@ -0,0 +1,161 @@
package logrus
import (
"bytes"
"fmt"
"runtime"
"sort"
"strings"
"time"
)
const (
nocolor = 0
red = 31
green = 32
yellow = 33
blue = 34
gray = 37
)
var (
baseTimestamp time.Time
isTerminal bool
)
func init() {
baseTimestamp = time.Now()
isTerminal = IsTerminal()
}
func miniTS() int {
return int(time.Since(baseTimestamp) / time.Second)
}
type TextFormatter struct {
// Set to true to bypass checking for a TTY before outputting colors.
ForceColors bool
// Force disabling colors.
DisableColors bool
// Disable timestamp logging. useful when output is redirected to logging
// system that already adds timestamps.
DisableTimestamp bool
// Enable logging the full timestamp when a TTY is attached instead of just
// the time passed since beginning of execution.
FullTimestamp bool
// TimestampFormat to use for display when a full timestamp is printed
TimestampFormat string
// The fields are sorted by default for a consistent output. For applications
// that log extremely frequently and don't use the JSON formatter this may not
// be desired.
DisableSorting bool
}
func (f *TextFormatter) Format(entry *Entry) ([]byte, error) {
var keys []string = make([]string, 0, len(entry.Data))
for k := range entry.Data {
keys = append(keys, k)
}
if !f.DisableSorting {
sort.Strings(keys)
}
b := &bytes.Buffer{}
prefixFieldClashes(entry.Data)
isColorTerminal := isTerminal && (runtime.GOOS != "windows")
isColored := (f.ForceColors || isColorTerminal) && !f.DisableColors
timestampFormat := f.TimestampFormat
if timestampFormat == "" {
timestampFormat = DefaultTimestampFormat
}
if isColored {
f.printColored(b, entry, keys, timestampFormat)
} else {
if !f.DisableTimestamp {
f.appendKeyValue(b, "time", entry.Time.Format(timestampFormat))
}
f.appendKeyValue(b, "level", entry.Level.String())
if entry.Message != "" {
f.appendKeyValue(b, "msg", entry.Message)
}
for _, key := range keys {
f.appendKeyValue(b, key, entry.Data[key])
}
}
b.WriteByte('\n')
return b.Bytes(), nil
}
func (f *TextFormatter) printColored(b *bytes.Buffer, entry *Entry, keys []string, timestampFormat string) {
var levelColor int
switch entry.Level {
case DebugLevel:
levelColor = gray
case WarnLevel:
levelColor = yellow
case ErrorLevel, FatalLevel, PanicLevel:
levelColor = red
default:
levelColor = blue
}
levelText := strings.ToUpper(entry.Level.String())[0:4]
if !f.FullTimestamp {
fmt.Fprintf(b, "\x1b[%dm%s\x1b[0m[%04d] %-44s ", levelColor, levelText, miniTS(), entry.Message)
} else {
fmt.Fprintf(b, "\x1b[%dm%s\x1b[0m[%s] %-44s ", levelColor, levelText, entry.Time.Format(timestampFormat), entry.Message)
}
for _, k := range keys {
v := entry.Data[k]
fmt.Fprintf(b, " \x1b[%dm%s\x1b[0m=%+v", levelColor, k, v)
}
}
func needsQuoting(text string) bool {
for _, ch := range text {
if !((ch >= 'a' && ch <= 'z') ||
(ch >= 'A' && ch <= 'Z') ||
(ch >= '0' && ch <= '9') ||
ch == '-' || ch == '.') {
return true
}
}
return false
}
func (f *TextFormatter) appendKeyValue(b *bytes.Buffer, key string, value interface{}) {
b.WriteString(key)
b.WriteByte('=')
switch value := value.(type) {
case string:
if !needsQuoting(value) {
b.WriteString(value)
} else {
fmt.Fprintf(b, "%q", value)
}
case error:
errmsg := value.Error()
if !needsQuoting(errmsg) {
b.WriteString(errmsg)
} else {
fmt.Fprintf(b, "%q", value)
}
default:
fmt.Fprint(b, value)
}
b.WriteByte(' ')
}

View file

@ -0,0 +1,53 @@
package logrus
import (
"bufio"
"io"
"runtime"
)
func (logger *Logger) Writer() *io.PipeWriter {
return logger.WriterLevel(InfoLevel)
}
func (logger *Logger) WriterLevel(level Level) *io.PipeWriter {
reader, writer := io.Pipe()
var printFunc func(args ...interface{})
switch level {
case DebugLevel:
printFunc = logger.Debug
case InfoLevel:
printFunc = logger.Info
case WarnLevel:
printFunc = logger.Warn
case ErrorLevel:
printFunc = logger.Error
case FatalLevel:
printFunc = logger.Fatal
case PanicLevel:
printFunc = logger.Panic
default:
printFunc = logger.Print
}
go logger.writerScanner(reader, printFunc)
runtime.SetFinalizer(writer, writerFinalizer)
return writer
}
func (logger *Logger) writerScanner(reader *io.PipeReader, printFunc func(args ...interface{})) {
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
printFunc(scanner.Text())
}
if err := scanner.Err(); err != nil {
logger.Errorf("Error while reading from Writer: %s", err)
}
reader.Close()
}
func writerFinalizer(writer *io.PipeWriter) {
writer.Close()
}

View file

@ -0,0 +1,20 @@
The MIT License (MIT)
Copyright (c) 2013 Ben Johnson
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View file

@ -0,0 +1,7 @@
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0x7FFFFFFF // 2GB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0xFFFFFFF

View file

@ -0,0 +1,7 @@
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View file

@ -0,0 +1,7 @@
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0x7FFFFFFF // 2GB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0xFFFFFFF

View file

@ -0,0 +1,9 @@
// +build arm64
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View file

@ -0,0 +1,10 @@
package bolt
import (
"syscall"
)
// fdatasync flushes written data to a file descriptor.
func fdatasync(db *DB) error {
return syscall.Fdatasync(int(db.file.Fd()))
}

View file

@ -0,0 +1,27 @@
package bolt
import (
"syscall"
"unsafe"
)
const (
msAsync = 1 << iota // perform asynchronous writes
msSync // perform synchronous writes
msInvalidate // invalidate cached data
)
func msync(db *DB) error {
_, _, errno := syscall.Syscall(syscall.SYS_MSYNC, uintptr(unsafe.Pointer(db.data)), uintptr(db.datasz), msInvalidate)
if errno != 0 {
return errno
}
return nil
}
func fdatasync(db *DB) error {
if db.data != nil {
return msync(db)
}
return db.file.Sync()
}

View file

@ -0,0 +1,9 @@
// +build ppc
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0x7FFFFFFF // 2GB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0xFFFFFFF

View file

@ -0,0 +1,9 @@
// +build ppc64
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View file

@ -0,0 +1,9 @@
// +build ppc64le
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View file

@ -0,0 +1,9 @@
// +build s390x
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View file

@ -0,0 +1,89 @@
// +build !windows,!plan9,!solaris
package bolt
import (
"fmt"
"os"
"syscall"
"time"
"unsafe"
)
// flock acquires an advisory lock on a file descriptor.
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
var t time.Time
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
flag := syscall.LOCK_SH
if exclusive {
flag = syscall.LOCK_EX
}
// Otherwise attempt to obtain an exclusive lock.
err := syscall.Flock(int(db.file.Fd()), flag|syscall.LOCK_NB)
if err == nil {
return nil
} else if err != syscall.EWOULDBLOCK {
return err
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
}
}
// funlock releases an advisory lock on a file descriptor.
func funlock(db *DB) error {
return syscall.Flock(int(db.file.Fd()), syscall.LOCK_UN)
}
// mmap memory maps a DB's data file.
func mmap(db *DB, sz int) error {
// Map the data file to memory.
b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
if err != nil {
return err
}
// Advise the kernel that the mmap is accessed randomly.
if err := madvise(b, syscall.MADV_RANDOM); err != nil {
return fmt.Errorf("madvise: %s", err)
}
// Save the original byte slice and convert to a byte array pointer.
db.dataref = b
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0]))
db.datasz = sz
return nil
}
// munmap unmaps a DB's data file from memory.
func munmap(db *DB) error {
// Ignore the unmap if we have no mapped data.
if db.dataref == nil {
return nil
}
// Unmap using the original byte slice.
err := syscall.Munmap(db.dataref)
db.dataref = nil
db.data = nil
db.datasz = 0
return err
}
// NOTE: This function is copied from stdlib because it is not available on darwin.
func madvise(b []byte, advice int) (err error) {
_, _, e1 := syscall.Syscall(syscall.SYS_MADVISE, uintptr(unsafe.Pointer(&b[0])), uintptr(len(b)), uintptr(advice))
if e1 != 0 {
err = e1
}
return
}

View file

@ -0,0 +1,90 @@
package bolt
import (
"fmt"
"os"
"syscall"
"time"
"unsafe"
"golang.org/x/sys/unix"
)
// flock acquires an advisory lock on a file descriptor.
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
var t time.Time
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Pid = 0
lock.Whence = 0
lock.Pid = 0
if exclusive {
lock.Type = syscall.F_WRLCK
} else {
lock.Type = syscall.F_RDLCK
}
err := syscall.FcntlFlock(db.file.Fd(), syscall.F_SETLK, &lock)
if err == nil {
return nil
} else if err != syscall.EAGAIN {
return err
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
}
}
// funlock releases an advisory lock on a file descriptor.
func funlock(db *DB) error {
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Type = syscall.F_UNLCK
lock.Whence = 0
return syscall.FcntlFlock(uintptr(db.file.Fd()), syscall.F_SETLK, &lock)
}
// mmap memory maps a DB's data file.
func mmap(db *DB, sz int) error {
// Map the data file to memory.
b, err := unix.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
if err != nil {
return err
}
// Advise the kernel that the mmap is accessed randomly.
if err := unix.Madvise(b, syscall.MADV_RANDOM); err != nil {
return fmt.Errorf("madvise: %s", err)
}
// Save the original byte slice and convert to a byte array pointer.
db.dataref = b
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0]))
db.datasz = sz
return nil
}
// munmap unmaps a DB's data file from memory.
func munmap(db *DB) error {
// Ignore the unmap if we have no mapped data.
if db.dataref == nil {
return nil
}
// Unmap using the original byte slice.
err := unix.Munmap(db.dataref)
db.dataref = nil
db.data = nil
db.datasz = 0
return err
}

View file

@ -0,0 +1,144 @@
package bolt
import (
"fmt"
"os"
"syscall"
"time"
"unsafe"
)
// LockFileEx code derived from golang build filemutex_windows.go @ v1.5.1
var (
modkernel32 = syscall.NewLazyDLL("kernel32.dll")
procLockFileEx = modkernel32.NewProc("LockFileEx")
procUnlockFileEx = modkernel32.NewProc("UnlockFileEx")
)
const (
lockExt = ".lock"
// see https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx
flagLockExclusive = 2
flagLockFailImmediately = 1
// see https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx
errLockViolation syscall.Errno = 0x21
)
func lockFileEx(h syscall.Handle, flags, reserved, locklow, lockhigh uint32, ol *syscall.Overlapped) (err error) {
r, _, err := procLockFileEx.Call(uintptr(h), uintptr(flags), uintptr(reserved), uintptr(locklow), uintptr(lockhigh), uintptr(unsafe.Pointer(ol)))
if r == 0 {
return err
}
return nil
}
func unlockFileEx(h syscall.Handle, reserved, locklow, lockhigh uint32, ol *syscall.Overlapped) (err error) {
r, _, err := procUnlockFileEx.Call(uintptr(h), uintptr(reserved), uintptr(locklow), uintptr(lockhigh), uintptr(unsafe.Pointer(ol)), 0)
if r == 0 {
return err
}
return nil
}
// fdatasync flushes written data to a file descriptor.
func fdatasync(db *DB) error {
return db.file.Sync()
}
// flock acquires an advisory lock on a file descriptor.
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
// Create a separate lock file on windows because a process
// cannot share an exclusive lock on the same file. This is
// needed during Tx.WriteTo().
f, err := os.OpenFile(db.path+lockExt, os.O_CREATE, mode)
if err != nil {
return err
}
db.lockfile = f
var t time.Time
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
var flag uint32 = flagLockFailImmediately
if exclusive {
flag |= flagLockExclusive
}
err := lockFileEx(syscall.Handle(db.lockfile.Fd()), flag, 0, 1, 0, &syscall.Overlapped{})
if err == nil {
return nil
} else if err != errLockViolation {
return err
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
}
}
// funlock releases an advisory lock on a file descriptor.
func funlock(db *DB) error {
err := unlockFileEx(syscall.Handle(db.lockfile.Fd()), 0, 1, 0, &syscall.Overlapped{})
db.lockfile.Close()
os.Remove(db.path+lockExt)
return err
}
// mmap memory maps a DB's data file.
// Based on: https://github.com/edsrzf/mmap-go
func mmap(db *DB, sz int) error {
if !db.readOnly {
// Truncate the database to the size of the mmap.
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("truncate: %s", err)
}
}
// Open a file mapping handle.
sizelo := uint32(sz >> 32)
sizehi := uint32(sz) & 0xffffffff
h, errno := syscall.CreateFileMapping(syscall.Handle(db.file.Fd()), nil, syscall.PAGE_READONLY, sizelo, sizehi, nil)
if h == 0 {
return os.NewSyscallError("CreateFileMapping", errno)
}
// Create the memory map.
addr, errno := syscall.MapViewOfFile(h, syscall.FILE_MAP_READ, 0, 0, uintptr(sz))
if addr == 0 {
return os.NewSyscallError("MapViewOfFile", errno)
}
// Close mapping handle.
if err := syscall.CloseHandle(syscall.Handle(h)); err != nil {
return os.NewSyscallError("CloseHandle", err)
}
// Convert to a byte array.
db.data = ((*[maxMapSize]byte)(unsafe.Pointer(addr)))
db.datasz = sz
return nil
}
// munmap unmaps a pointer from a file.
// Based on: https://github.com/edsrzf/mmap-go
func munmap(db *DB) error {
if db.data == nil {
return nil
}
addr := (uintptr)(unsafe.Pointer(&db.data[0]))
if err := syscall.UnmapViewOfFile(addr); err != nil {
return os.NewSyscallError("UnmapViewOfFile", err)
}
return nil
}

View file

@ -0,0 +1,8 @@
// +build !windows,!plan9,!linux,!openbsd
package bolt
// fdatasync flushes written data to a file descriptor.
func fdatasync(db *DB) error {
return db.file.Sync()
}

View file

@ -0,0 +1,748 @@
package bolt
import (
"bytes"
"fmt"
"unsafe"
)
const (
// MaxKeySize is the maximum length of a key, in bytes.
MaxKeySize = 32768
// MaxValueSize is the maximum length of a value, in bytes.
MaxValueSize = (1 << 31) - 2
)
const (
maxUint = ^uint(0)
minUint = 0
maxInt = int(^uint(0) >> 1)
minInt = -maxInt - 1
)
const bucketHeaderSize = int(unsafe.Sizeof(bucket{}))
const (
minFillPercent = 0.1
maxFillPercent = 1.0
)
// DefaultFillPercent is the percentage that split pages are filled.
// This value can be changed by setting Bucket.FillPercent.
const DefaultFillPercent = 0.5
// Bucket represents a collection of key/value pairs inside the database.
type Bucket struct {
*bucket
tx *Tx // the associated transaction
buckets map[string]*Bucket // subbucket cache
page *page // inline page reference
rootNode *node // materialized node for the root page.
nodes map[pgid]*node // node cache
// Sets the threshold for filling nodes when they split. By default,
// the bucket will fill to 50% but it can be useful to increase this
// amount if you know that your write workloads are mostly append-only.
//
// This is non-persisted across transactions so it must be set in every Tx.
FillPercent float64
}
// bucket represents the on-file representation of a bucket.
// This is stored as the "value" of a bucket key. If the bucket is small enough,
// then its root page can be stored inline in the "value", after the bucket
// header. In the case of inline buckets, the "root" will be 0.
type bucket struct {
root pgid // page id of the bucket's root-level page
sequence uint64 // monotonically incrementing, used by NextSequence()
}
// newBucket returns a new bucket associated with a transaction.
func newBucket(tx *Tx) Bucket {
var b = Bucket{tx: tx, FillPercent: DefaultFillPercent}
if tx.writable {
b.buckets = make(map[string]*Bucket)
b.nodes = make(map[pgid]*node)
}
return b
}
// Tx returns the tx of the bucket.
func (b *Bucket) Tx() *Tx {
return b.tx
}
// Root returns the root of the bucket.
func (b *Bucket) Root() pgid {
return b.root
}
// Writable returns whether the bucket is writable.
func (b *Bucket) Writable() bool {
return b.tx.writable
}
// Cursor creates a cursor associated with the bucket.
// The cursor is only valid as long as the transaction is open.
// Do not use a cursor after the transaction is closed.
func (b *Bucket) Cursor() *Cursor {
// Update transaction statistics.
b.tx.stats.CursorCount++
// Allocate and return a cursor.
return &Cursor{
bucket: b,
stack: make([]elemRef, 0),
}
}
// Bucket retrieves a nested bucket by name.
// Returns nil if the bucket does not exist.
// The bucket instance is only valid for the lifetime of the transaction.
func (b *Bucket) Bucket(name []byte) *Bucket {
if b.buckets != nil {
if child := b.buckets[string(name)]; child != nil {
return child
}
}
// Move cursor to key.
c := b.Cursor()
k, v, flags := c.seek(name)
// Return nil if the key doesn't exist or it is not a bucket.
if !bytes.Equal(name, k) || (flags&bucketLeafFlag) == 0 {
return nil
}
// Otherwise create a bucket and cache it.
var child = b.openBucket(v)
if b.buckets != nil {
b.buckets[string(name)] = child
}
return child
}
// Helper method that re-interprets a sub-bucket value
// from a parent into a Bucket
func (b *Bucket) openBucket(value []byte) *Bucket {
var child = newBucket(b.tx)
// If this is a writable transaction then we need to copy the bucket entry.
// Read-only transactions can point directly at the mmap entry.
if b.tx.writable {
child.bucket = &bucket{}
*child.bucket = *(*bucket)(unsafe.Pointer(&value[0]))
} else {
child.bucket = (*bucket)(unsafe.Pointer(&value[0]))
}
// Save a reference to the inline page if the bucket is inline.
if child.root == 0 {
child.page = (*page)(unsafe.Pointer(&value[bucketHeaderSize]))
}
return &child
}
// CreateBucket creates a new bucket at the given key and returns the new bucket.
// Returns an error if the key already exists, if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
if b.tx.db == nil {
return nil, ErrTxClosed
} else if !b.tx.writable {
return nil, ErrTxNotWritable
} else if len(key) == 0 {
return nil, ErrBucketNameRequired
}
// Move cursor to correct position.
c := b.Cursor()
k, _, flags := c.seek(key)
// Return an error if there is an existing key.
if bytes.Equal(key, k) {
if (flags & bucketLeafFlag) != 0 {
return nil, ErrBucketExists
} else {
return nil, ErrIncompatibleValue
}
}
// Create empty, inline bucket.
var bucket = Bucket{
bucket: &bucket{},
rootNode: &node{isLeaf: true},
FillPercent: DefaultFillPercent,
}
var value = bucket.write()
// Insert into node.
key = cloneBytes(key)
c.node().put(key, key, value, 0, bucketLeafFlag)
// Since subbuckets are not allowed on inline buckets, we need to
// dereference the inline page, if it exists. This will cause the bucket
// to be treated as a regular, non-inline bucket for the rest of the tx.
b.page = nil
return b.Bucket(key), nil
}
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist and returns a reference to it.
// Returns an error if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (b *Bucket) CreateBucketIfNotExists(key []byte) (*Bucket, error) {
child, err := b.CreateBucket(key)
if err == ErrBucketExists {
return b.Bucket(key), nil
} else if err != nil {
return nil, err
}
return child, nil
}
// DeleteBucket deletes a bucket at the given key.
// Returns an error if the bucket does not exists, or if the key represents a non-bucket value.
func (b *Bucket) DeleteBucket(key []byte) error {
if b.tx.db == nil {
return ErrTxClosed
} else if !b.Writable() {
return ErrTxNotWritable
}
// Move cursor to correct position.
c := b.Cursor()
k, _, flags := c.seek(key)
// Return an error if bucket doesn't exist or is not a bucket.
if !bytes.Equal(key, k) {
return ErrBucketNotFound
} else if (flags & bucketLeafFlag) == 0 {
return ErrIncompatibleValue
}
// Recursively delete all child buckets.
child := b.Bucket(key)
err := child.ForEach(func(k, v []byte) error {
if v == nil {
if err := child.DeleteBucket(k); err != nil {
return fmt.Errorf("delete bucket: %s", err)
}
}
return nil
})
if err != nil {
return err
}
// Remove cached copy.
delete(b.buckets, string(key))
// Release all bucket pages to freelist.
child.nodes = nil
child.rootNode = nil
child.free()
// Delete the node if we have a matching key.
c.node().del(key)
return nil
}
// Get retrieves the value for a key in the bucket.
// Returns a nil value if the key does not exist or if the key is a nested bucket.
// The returned value is only valid for the life of the transaction.
func (b *Bucket) Get(key []byte) []byte {
k, v, flags := b.Cursor().seek(key)
// Return nil if this is a bucket.
if (flags & bucketLeafFlag) != 0 {
return nil
}
// If our target node isn't the same key as what's passed in then return nil.
if !bytes.Equal(key, k) {
return nil
}
return v
}
// Put sets the value for a key in the bucket.
// If the key exist then its previous value will be overwritten.
// Supplied value must remain valid for the life of the transaction.
// Returns an error if the bucket was created from a read-only transaction, if the key is blank, if the key is too large, or if the value is too large.
func (b *Bucket) Put(key []byte, value []byte) error {
if b.tx.db == nil {
return ErrTxClosed
} else if !b.Writable() {
return ErrTxNotWritable
} else if len(key) == 0 {
return ErrKeyRequired
} else if len(key) > MaxKeySize {
return ErrKeyTooLarge
} else if int64(len(value)) > MaxValueSize {
return ErrValueTooLarge
}
// Move cursor to correct position.
c := b.Cursor()
k, _, flags := c.seek(key)
// Return an error if there is an existing key with a bucket value.
if bytes.Equal(key, k) && (flags&bucketLeafFlag) != 0 {
return ErrIncompatibleValue
}
// Insert into node.
key = cloneBytes(key)
c.node().put(key, key, value, 0, 0)
return nil
}
// Delete removes a key from the bucket.
// If the key does not exist then nothing is done and a nil error is returned.
// Returns an error if the bucket was created from a read-only transaction.
func (b *Bucket) Delete(key []byte) error {
if b.tx.db == nil {
return ErrTxClosed
} else if !b.Writable() {
return ErrTxNotWritable
}
// Move cursor to correct position.
c := b.Cursor()
_, _, flags := c.seek(key)
// Return an error if there is already existing bucket value.
if (flags & bucketLeafFlag) != 0 {
return ErrIncompatibleValue
}
// Delete the node if we have a matching key.
c.node().del(key)
return nil
}
// NextSequence returns an autoincrementing integer for the bucket.
func (b *Bucket) NextSequence() (uint64, error) {
if b.tx.db == nil {
return 0, ErrTxClosed
} else if !b.Writable() {
return 0, ErrTxNotWritable
}
// Materialize the root node if it hasn't been already so that the
// bucket will be saved during commit.
if b.rootNode == nil {
_ = b.node(b.root, nil)
}
// Increment and return the sequence.
b.bucket.sequence++
return b.bucket.sequence, nil
}
// ForEach executes a function for each key/value pair in a bucket.
// If the provided function returns an error then the iteration is stopped and
// the error is returned to the caller. The provided function must not modify
// the bucket; this will result in undefined behavior.
func (b *Bucket) ForEach(fn func(k, v []byte) error) error {
if b.tx.db == nil {
return ErrTxClosed
}
c := b.Cursor()
for k, v := c.First(); k != nil; k, v = c.Next() {
if err := fn(k, v); err != nil {
return err
}
}
return nil
}
// Stat returns stats on a bucket.
func (b *Bucket) Stats() BucketStats {
var s, subStats BucketStats
pageSize := b.tx.db.pageSize
s.BucketN += 1
if b.root == 0 {
s.InlineBucketN += 1
}
b.forEachPage(func(p *page, depth int) {
if (p.flags & leafPageFlag) != 0 {
s.KeyN += int(p.count)
// used totals the used bytes for the page
used := pageHeaderSize
if p.count != 0 {
// If page has any elements, add all element headers.
used += leafPageElementSize * int(p.count-1)
// Add all element key, value sizes.
// The computation takes advantage of the fact that the position
// of the last element's key/value equals to the total of the sizes
// of all previous elements' keys and values.
// It also includes the last element's header.
lastElement := p.leafPageElement(p.count - 1)
used += int(lastElement.pos + lastElement.ksize + lastElement.vsize)
}
if b.root == 0 {
// For inlined bucket just update the inline stats
s.InlineBucketInuse += used
} else {
// For non-inlined bucket update all the leaf stats
s.LeafPageN++
s.LeafInuse += used
s.LeafOverflowN += int(p.overflow)
// Collect stats from sub-buckets.
// Do that by iterating over all element headers
// looking for the ones with the bucketLeafFlag.
for i := uint16(0); i < p.count; i++ {
e := p.leafPageElement(i)
if (e.flags & bucketLeafFlag) != 0 {
// For any bucket element, open the element value
// and recursively call Stats on the contained bucket.
subStats.Add(b.openBucket(e.value()).Stats())
}
}
}
} else if (p.flags & branchPageFlag) != 0 {
s.BranchPageN++
lastElement := p.branchPageElement(p.count - 1)
// used totals the used bytes for the page
// Add header and all element headers.
used := pageHeaderSize + (branchPageElementSize * int(p.count-1))
// Add size of all keys and values.
// Again, use the fact that last element's position equals to
// the total of key, value sizes of all previous elements.
used += int(lastElement.pos + lastElement.ksize)
s.BranchInuse += used
s.BranchOverflowN += int(p.overflow)
}
// Keep track of maximum page depth.
if depth+1 > s.Depth {
s.Depth = (depth + 1)
}
})
// Alloc stats can be computed from page counts and pageSize.
s.BranchAlloc = (s.BranchPageN + s.BranchOverflowN) * pageSize
s.LeafAlloc = (s.LeafPageN + s.LeafOverflowN) * pageSize
// Add the max depth of sub-buckets to get total nested depth.
s.Depth += subStats.Depth
// Add the stats for all sub-buckets
s.Add(subStats)
return s
}
// forEachPage iterates over every page in a bucket, including inline pages.
func (b *Bucket) forEachPage(fn func(*page, int)) {
// If we have an inline page then just use that.
if b.page != nil {
fn(b.page, 0)
return
}
// Otherwise traverse the page hierarchy.
b.tx.forEachPage(b.root, 0, fn)
}
// forEachPageNode iterates over every page (or node) in a bucket.
// This also includes inline pages.
func (b *Bucket) forEachPageNode(fn func(*page, *node, int)) {
// If we have an inline page or root node then just use that.
if b.page != nil {
fn(b.page, nil, 0)
return
}
b._forEachPageNode(b.root, 0, fn)
}
func (b *Bucket) _forEachPageNode(pgid pgid, depth int, fn func(*page, *node, int)) {
var p, n = b.pageNode(pgid)
// Execute function.
fn(p, n, depth)
// Recursively loop over children.
if p != nil {
if (p.flags & branchPageFlag) != 0 {
for i := 0; i < int(p.count); i++ {
elem := p.branchPageElement(uint16(i))
b._forEachPageNode(elem.pgid, depth+1, fn)
}
}
} else {
if !n.isLeaf {
for _, inode := range n.inodes {
b._forEachPageNode(inode.pgid, depth+1, fn)
}
}
}
}
// spill writes all the nodes for this bucket to dirty pages.
func (b *Bucket) spill() error {
// Spill all child buckets first.
for name, child := range b.buckets {
// If the child bucket is small enough and it has no child buckets then
// write it inline into the parent bucket's page. Otherwise spill it
// like a normal bucket and make the parent value a pointer to the page.
var value []byte
if child.inlineable() {
child.free()
value = child.write()
} else {
if err := child.spill(); err != nil {
return err
}
// Update the child bucket header in this bucket.
value = make([]byte, unsafe.Sizeof(bucket{}))
var bucket = (*bucket)(unsafe.Pointer(&value[0]))
*bucket = *child.bucket
}
// Skip writing the bucket if there are no materialized nodes.
if child.rootNode == nil {
continue
}
// Update parent node.
var c = b.Cursor()
k, _, flags := c.seek([]byte(name))
if !bytes.Equal([]byte(name), k) {
panic(fmt.Sprintf("misplaced bucket header: %x -> %x", []byte(name), k))
}
if flags&bucketLeafFlag == 0 {
panic(fmt.Sprintf("unexpected bucket header flag: %x", flags))
}
c.node().put([]byte(name), []byte(name), value, 0, bucketLeafFlag)
}
// Ignore if there's not a materialized root node.
if b.rootNode == nil {
return nil
}
// Spill nodes.
if err := b.rootNode.spill(); err != nil {
return err
}
b.rootNode = b.rootNode.root()
// Update the root node for this bucket.
if b.rootNode.pgid >= b.tx.meta.pgid {
panic(fmt.Sprintf("pgid (%d) above high water mark (%d)", b.rootNode.pgid, b.tx.meta.pgid))
}
b.root = b.rootNode.pgid
return nil
}
// inlineable returns true if a bucket is small enough to be written inline
// and if it contains no subbuckets. Otherwise returns false.
func (b *Bucket) inlineable() bool {
var n = b.rootNode
// Bucket must only contain a single leaf node.
if n == nil || !n.isLeaf {
return false
}
// Bucket is not inlineable if it contains subbuckets or if it goes beyond
// our threshold for inline bucket size.
var size = pageHeaderSize
for _, inode := range n.inodes {
size += leafPageElementSize + len(inode.key) + len(inode.value)
if inode.flags&bucketLeafFlag != 0 {
return false
} else if size > b.maxInlineBucketSize() {
return false
}
}
return true
}
// Returns the maximum total size of a bucket to make it a candidate for inlining.
func (b *Bucket) maxInlineBucketSize() int {
return b.tx.db.pageSize / 4
}
// write allocates and writes a bucket to a byte slice.
func (b *Bucket) write() []byte {
// Allocate the appropriate size.
var n = b.rootNode
var value = make([]byte, bucketHeaderSize+n.size())
// Write a bucket header.
var bucket = (*bucket)(unsafe.Pointer(&value[0]))
*bucket = *b.bucket
// Convert byte slice to a fake page and write the root node.
var p = (*page)(unsafe.Pointer(&value[bucketHeaderSize]))
n.write(p)
return value
}
// rebalance attempts to balance all nodes.
func (b *Bucket) rebalance() {
for _, n := range b.nodes {
n.rebalance()
}
for _, child := range b.buckets {
child.rebalance()
}
}
// node creates a node from a page and associates it with a given parent.
func (b *Bucket) node(pgid pgid, parent *node) *node {
_assert(b.nodes != nil, "nodes map expected")
// Retrieve node if it's already been created.
if n := b.nodes[pgid]; n != nil {
return n
}
// Otherwise create a node and cache it.
n := &node{bucket: b, parent: parent}
if parent == nil {
b.rootNode = n
} else {
parent.children = append(parent.children, n)
}
// Use the inline page if this is an inline bucket.
var p = b.page
if p == nil {
p = b.tx.page(pgid)
}
// Read the page into the node and cache it.
n.read(p)
b.nodes[pgid] = n
// Update statistics.
b.tx.stats.NodeCount++
return n
}
// free recursively frees all pages in the bucket.
func (b *Bucket) free() {
if b.root == 0 {
return
}
var tx = b.tx
b.forEachPageNode(func(p *page, n *node, _ int) {
if p != nil {
tx.db.freelist.free(tx.meta.txid, p)
} else {
n.free()
}
})
b.root = 0
}
// dereference removes all references to the old mmap.
func (b *Bucket) dereference() {
if b.rootNode != nil {
b.rootNode.root().dereference()
}
for _, child := range b.buckets {
child.dereference()
}
}
// pageNode returns the in-memory node, if it exists.
// Otherwise returns the underlying page.
func (b *Bucket) pageNode(id pgid) (*page, *node) {
// Inline buckets have a fake page embedded in their value so treat them
// differently. We'll return the rootNode (if available) or the fake page.
if b.root == 0 {
if id != 0 {
panic(fmt.Sprintf("inline bucket non-zero page access(2): %d != 0", id))
}
if b.rootNode != nil {
return nil, b.rootNode
}
return b.page, nil
}
// Check the node cache for non-inline buckets.
if b.nodes != nil {
if n := b.nodes[id]; n != nil {
return nil, n
}
}
// Finally lookup the page from the transaction if no node is materialized.
return b.tx.page(id), nil
}
// BucketStats records statistics about resources used by a bucket.
type BucketStats struct {
// Page count statistics.
BranchPageN int // number of logical branch pages
BranchOverflowN int // number of physical branch overflow pages
LeafPageN int // number of logical leaf pages
LeafOverflowN int // number of physical leaf overflow pages
// Tree statistics.
KeyN int // number of keys/value pairs
Depth int // number of levels in B+tree
// Page size utilization.
BranchAlloc int // bytes allocated for physical branch pages
BranchInuse int // bytes actually used for branch data
LeafAlloc int // bytes allocated for physical leaf pages
LeafInuse int // bytes actually used for leaf data
// Bucket statistics
BucketN int // total number of buckets including the top bucket
InlineBucketN int // total number on inlined buckets
InlineBucketInuse int // bytes used for inlined buckets (also accounted for in LeafInuse)
}
func (s *BucketStats) Add(other BucketStats) {
s.BranchPageN += other.BranchPageN
s.BranchOverflowN += other.BranchOverflowN
s.LeafPageN += other.LeafPageN
s.LeafOverflowN += other.LeafOverflowN
s.KeyN += other.KeyN
if s.Depth < other.Depth {
s.Depth = other.Depth
}
s.BranchAlloc += other.BranchAlloc
s.BranchInuse += other.BranchInuse
s.LeafAlloc += other.LeafAlloc
s.LeafInuse += other.LeafInuse
s.BucketN += other.BucketN
s.InlineBucketN += other.InlineBucketN
s.InlineBucketInuse += other.InlineBucketInuse
}
// cloneBytes returns a copy of a given slice.
func cloneBytes(v []byte) []byte {
var clone = make([]byte, len(v))
copy(clone, v)
return clone
}

View file

@ -0,0 +1,400 @@
package bolt
import (
"bytes"
"fmt"
"sort"
)
// Cursor represents an iterator that can traverse over all key/value pairs in a bucket in sorted order.
// Cursors see nested buckets with value == nil.
// Cursors can be obtained from a transaction and are valid as long as the transaction is open.
//
// Keys and values returned from the cursor are only valid for the life of the transaction.
//
// Changing data while traversing with a cursor may cause it to be invalidated
// and return unexpected keys and/or values. You must reposition your cursor
// after mutating data.
type Cursor struct {
bucket *Bucket
stack []elemRef
}
// Bucket returns the bucket that this cursor was created from.
func (c *Cursor) Bucket() *Bucket {
return c.bucket
}
// First moves the cursor to the first item in the bucket and returns its key and value.
// If the bucket is empty then a nil key and value are returned.
// The returned key and value are only valid for the life of the transaction.
func (c *Cursor) First() (key []byte, value []byte) {
_assert(c.bucket.tx.db != nil, "tx closed")
c.stack = c.stack[:0]
p, n := c.bucket.pageNode(c.bucket.root)
c.stack = append(c.stack, elemRef{page: p, node: n, index: 0})
c.first()
// If we land on an empty page then move to the next value.
// https://github.com/boltdb/bolt/issues/450
if c.stack[len(c.stack)-1].count() == 0 {
c.next()
}
k, v, flags := c.keyValue()
if (flags & uint32(bucketLeafFlag)) != 0 {
return k, nil
}
return k, v
}
// Last moves the cursor to the last item in the bucket and returns its key and value.
// If the bucket is empty then a nil key and value are returned.
// The returned key and value are only valid for the life of the transaction.
func (c *Cursor) Last() (key []byte, value []byte) {
_assert(c.bucket.tx.db != nil, "tx closed")
c.stack = c.stack[:0]
p, n := c.bucket.pageNode(c.bucket.root)
ref := elemRef{page: p, node: n}
ref.index = ref.count() - 1
c.stack = append(c.stack, ref)
c.last()
k, v, flags := c.keyValue()
if (flags & uint32(bucketLeafFlag)) != 0 {
return k, nil
}
return k, v
}
// Next moves the cursor to the next item in the bucket and returns its key and value.
// If the cursor is at the end of the bucket then a nil key and value are returned.
// The returned key and value are only valid for the life of the transaction.
func (c *Cursor) Next() (key []byte, value []byte) {
_assert(c.bucket.tx.db != nil, "tx closed")
k, v, flags := c.next()
if (flags & uint32(bucketLeafFlag)) != 0 {
return k, nil
}
return k, v
}
// Prev moves the cursor to the previous item in the bucket and returns its key and value.
// If the cursor is at the beginning of the bucket then a nil key and value are returned.
// The returned key and value are only valid for the life of the transaction.
func (c *Cursor) Prev() (key []byte, value []byte) {
_assert(c.bucket.tx.db != nil, "tx closed")
// Attempt to move back one element until we're successful.
// Move up the stack as we hit the beginning of each page in our stack.
for i := len(c.stack) - 1; i >= 0; i-- {
elem := &c.stack[i]
if elem.index > 0 {
elem.index--
break
}
c.stack = c.stack[:i]
}
// If we've hit the end then return nil.
if len(c.stack) == 0 {
return nil, nil
}
// Move down the stack to find the last element of the last leaf under this branch.
c.last()
k, v, flags := c.keyValue()
if (flags & uint32(bucketLeafFlag)) != 0 {
return k, nil
}
return k, v
}
// Seek moves the cursor to a given key and returns it.
// If the key does not exist then the next key is used. If no keys
// follow, a nil key is returned.
// The returned key and value are only valid for the life of the transaction.
func (c *Cursor) Seek(seek []byte) (key []byte, value []byte) {
k, v, flags := c.seek(seek)
// If we ended up after the last element of a page then move to the next one.
if ref := &c.stack[len(c.stack)-1]; ref.index >= ref.count() {
k, v, flags = c.next()
}
if k == nil {
return nil, nil
} else if (flags & uint32(bucketLeafFlag)) != 0 {
return k, nil
}
return k, v
}
// Delete removes the current key/value under the cursor from the bucket.
// Delete fails if current key/value is a bucket or if the transaction is not writable.
func (c *Cursor) Delete() error {
if c.bucket.tx.db == nil {
return ErrTxClosed
} else if !c.bucket.Writable() {
return ErrTxNotWritable
}
key, _, flags := c.keyValue()
// Return an error if current value is a bucket.
if (flags & bucketLeafFlag) != 0 {
return ErrIncompatibleValue
}
c.node().del(key)
return nil
}
// seek moves the cursor to a given key and returns it.
// If the key does not exist then the next key is used.
func (c *Cursor) seek(seek []byte) (key []byte, value []byte, flags uint32) {
_assert(c.bucket.tx.db != nil, "tx closed")
// Start from root page/node and traverse to correct page.
c.stack = c.stack[:0]
c.search(seek, c.bucket.root)
ref := &c.stack[len(c.stack)-1]
// If the cursor is pointing to the end of page/node then return nil.
if ref.index >= ref.count() {
return nil, nil, 0
}
// If this is a bucket then return a nil value.
return c.keyValue()
}
// first moves the cursor to the first leaf element under the last page in the stack.
func (c *Cursor) first() {
for {
// Exit when we hit a leaf page.
var ref = &c.stack[len(c.stack)-1]
if ref.isLeaf() {
break
}
// Keep adding pages pointing to the first element to the stack.
var pgid pgid
if ref.node != nil {
pgid = ref.node.inodes[ref.index].pgid
} else {
pgid = ref.page.branchPageElement(uint16(ref.index)).pgid
}
p, n := c.bucket.pageNode(pgid)
c.stack = append(c.stack, elemRef{page: p, node: n, index: 0})
}
}
// last moves the cursor to the last leaf element under the last page in the stack.
func (c *Cursor) last() {
for {
// Exit when we hit a leaf page.
ref := &c.stack[len(c.stack)-1]
if ref.isLeaf() {
break
}
// Keep adding pages pointing to the last element in the stack.
var pgid pgid
if ref.node != nil {
pgid = ref.node.inodes[ref.index].pgid
} else {
pgid = ref.page.branchPageElement(uint16(ref.index)).pgid
}
p, n := c.bucket.pageNode(pgid)
var nextRef = elemRef{page: p, node: n}
nextRef.index = nextRef.count() - 1
c.stack = append(c.stack, nextRef)
}
}
// next moves to the next leaf element and returns the key and value.
// If the cursor is at the last leaf element then it stays there and returns nil.
func (c *Cursor) next() (key []byte, value []byte, flags uint32) {
for {
// Attempt to move over one element until we're successful.
// Move up the stack as we hit the end of each page in our stack.
var i int
for i = len(c.stack) - 1; i >= 0; i-- {
elem := &c.stack[i]
if elem.index < elem.count()-1 {
elem.index++
break
}
}
// If we've hit the root page then stop and return. This will leave the
// cursor on the last element of the last page.
if i == -1 {
return nil, nil, 0
}
// Otherwise start from where we left off in the stack and find the
// first element of the first leaf page.
c.stack = c.stack[:i+1]
c.first()
// If this is an empty page then restart and move back up the stack.
// https://github.com/boltdb/bolt/issues/450
if c.stack[len(c.stack)-1].count() == 0 {
continue
}
return c.keyValue()
}
}
// search recursively performs a binary search against a given page/node until it finds a given key.
func (c *Cursor) search(key []byte, pgid pgid) {
p, n := c.bucket.pageNode(pgid)
if p != nil && (p.flags&(branchPageFlag|leafPageFlag)) == 0 {
panic(fmt.Sprintf("invalid page type: %d: %x", p.id, p.flags))
}
e := elemRef{page: p, node: n}
c.stack = append(c.stack, e)
// If we're on a leaf page/node then find the specific node.
if e.isLeaf() {
c.nsearch(key)
return
}
if n != nil {
c.searchNode(key, n)
return
}
c.searchPage(key, p)
}
func (c *Cursor) searchNode(key []byte, n *node) {
var exact bool
index := sort.Search(len(n.inodes), func(i int) bool {
// TODO(benbjohnson): Optimize this range search. It's a bit hacky right now.
// sort.Search() finds the lowest index where f() != -1 but we need the highest index.
ret := bytes.Compare(n.inodes[i].key, key)
if ret == 0 {
exact = true
}
return ret != -1
})
if !exact && index > 0 {
index--
}
c.stack[len(c.stack)-1].index = index
// Recursively search to the next page.
c.search(key, n.inodes[index].pgid)
}
func (c *Cursor) searchPage(key []byte, p *page) {
// Binary search for the correct range.
inodes := p.branchPageElements()
var exact bool
index := sort.Search(int(p.count), func(i int) bool {
// TODO(benbjohnson): Optimize this range search. It's a bit hacky right now.
// sort.Search() finds the lowest index where f() != -1 but we need the highest index.
ret := bytes.Compare(inodes[i].key(), key)
if ret == 0 {
exact = true
}
return ret != -1
})
if !exact && index > 0 {
index--
}
c.stack[len(c.stack)-1].index = index
// Recursively search to the next page.
c.search(key, inodes[index].pgid)
}
// nsearch searches the leaf node on the top of the stack for a key.
func (c *Cursor) nsearch(key []byte) {
e := &c.stack[len(c.stack)-1]
p, n := e.page, e.node
// If we have a node then search its inodes.
if n != nil {
index := sort.Search(len(n.inodes), func(i int) bool {
return bytes.Compare(n.inodes[i].key, key) != -1
})
e.index = index
return
}
// If we have a page then search its leaf elements.
inodes := p.leafPageElements()
index := sort.Search(int(p.count), func(i int) bool {
return bytes.Compare(inodes[i].key(), key) != -1
})
e.index = index
}
// keyValue returns the key and value of the current leaf element.
func (c *Cursor) keyValue() ([]byte, []byte, uint32) {
ref := &c.stack[len(c.stack)-1]
if ref.count() == 0 || ref.index >= ref.count() {
return nil, nil, 0
}
// Retrieve value from node.
if ref.node != nil {
inode := &ref.node.inodes[ref.index]
return inode.key, inode.value, inode.flags
}
// Or retrieve value from page.
elem := ref.page.leafPageElement(uint16(ref.index))
return elem.key(), elem.value(), elem.flags
}
// node returns the node that the cursor is currently positioned on.
func (c *Cursor) node() *node {
_assert(len(c.stack) > 0, "accessing a node with a zero-length cursor stack")
// If the top of the stack is a leaf node then just return it.
if ref := &c.stack[len(c.stack)-1]; ref.node != nil && ref.isLeaf() {
return ref.node
}
// Start from root and traverse down the hierarchy.
var n = c.stack[0].node
if n == nil {
n = c.bucket.node(c.stack[0].page.id, nil)
}
for _, ref := range c.stack[:len(c.stack)-1] {
_assert(!n.isLeaf, "expected branch node")
n = n.childAt(int(ref.index))
}
_assert(n.isLeaf, "expected leaf node")
return n
}
// elemRef represents a reference to an element on a given page/node.
type elemRef struct {
page *page
node *node
index int
}
// isLeaf returns whether the ref is pointing at a leaf page/node.
func (r *elemRef) isLeaf() bool {
if r.node != nil {
return r.node.isLeaf
}
return (r.page.flags & leafPageFlag) != 0
}
// count returns the number of inodes or page elements.
func (r *elemRef) count() int {
if r.node != nil {
return len(r.node.inodes)
}
return int(r.page.count)
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,44 @@
/*
Package bolt implements a low-level key/value store in pure Go. It supports
fully serializable transactions, ACID semantics, and lock-free MVCC with
multiple readers and a single writer. Bolt can be used for projects that
want a simple data store without the need to add large dependencies such as
Postgres or MySQL.
Bolt is a single-level, zero-copy, B+tree data store. This means that Bolt is
optimized for fast read access and does not require recovery in the event of a
system crash. Transactions which have not finished committing will simply be
rolled back in the event of a crash.
The design of Bolt is based on Howard Chu's LMDB database project.
Bolt currently works on Windows, Mac OS X, and Linux.
Basics
There are only a few types in Bolt: DB, Bucket, Tx, and Cursor. The DB is
a collection of buckets and is represented by a single file on disk. A bucket is
a collection of unique keys that are associated with values.
Transactions provide either read-only or read-write access to the database.
Read-only transactions can retrieve key/value pairs and can use Cursors to
iterate over the dataset sequentially. Read-write transactions can create and
delete buckets and can insert and remove keys. Only one read-write transaction
is allowed at a time.
Caveats
The database uses a read-only, memory-mapped data file to ensure that
applications cannot corrupt the database, however, this means that keys and
values returned from Bolt cannot be changed. Writing to a read-only byte slice
will cause Go to panic.
Keys and values retrieved from the database are only valid for the life of
the transaction. When used outside the transaction, these byte slices can
point to different data or can point to invalid memory which will cause a panic.
*/
package bolt

View file

@ -0,0 +1,71 @@
package bolt
import "errors"
// These errors can be returned when opening or calling methods on a DB.
var (
// ErrDatabaseNotOpen is returned when a DB instance is accessed before it
// is opened or after it is closed.
ErrDatabaseNotOpen = errors.New("database not open")
// ErrDatabaseOpen is returned when opening a database that is
// already open.
ErrDatabaseOpen = errors.New("database already open")
// ErrInvalid is returned when both meta pages on a database are invalid.
// This typically occurs when a file is not a bolt database.
ErrInvalid = errors.New("invalid database")
// ErrVersionMismatch is returned when the data file was created with a
// different version of Bolt.
ErrVersionMismatch = errors.New("version mismatch")
// ErrChecksum is returned when either meta page checksum does not match.
ErrChecksum = errors.New("checksum error")
// ErrTimeout is returned when a database cannot obtain an exclusive lock
// on the data file after the timeout passed to Open().
ErrTimeout = errors.New("timeout")
)
// These errors can occur when beginning or committing a Tx.
var (
// ErrTxNotWritable is returned when performing a write operation on a
// read-only transaction.
ErrTxNotWritable = errors.New("tx not writable")
// ErrTxClosed is returned when committing or rolling back a transaction
// that has already been committed or rolled back.
ErrTxClosed = errors.New("tx closed")
// ErrDatabaseReadOnly is returned when a mutating transaction is started on a
// read-only database.
ErrDatabaseReadOnly = errors.New("database is in read-only mode")
)
// These errors can occur when putting or deleting a value or a bucket.
var (
// ErrBucketNotFound is returned when trying to access a bucket that has
// not been created yet.
ErrBucketNotFound = errors.New("bucket not found")
// ErrBucketExists is returned when creating a bucket that already exists.
ErrBucketExists = errors.New("bucket already exists")
// ErrBucketNameRequired is returned when creating a bucket with a blank name.
ErrBucketNameRequired = errors.New("bucket name required")
// ErrKeyRequired is returned when inserting a zero-length key.
ErrKeyRequired = errors.New("key required")
// ErrKeyTooLarge is returned when inserting a key that is larger than MaxKeySize.
ErrKeyTooLarge = errors.New("key too large")
// ErrValueTooLarge is returned when inserting a value that is larger than MaxValueSize.
ErrValueTooLarge = errors.New("value too large")
// ErrIncompatibleValue is returned when trying create or delete a bucket
// on an existing non-bucket key or when trying to create or delete a
// non-bucket key on an existing bucket key.
ErrIncompatibleValue = errors.New("incompatible value")
)

View file

@ -0,0 +1,242 @@
package bolt
import (
"fmt"
"sort"
"unsafe"
)
// freelist represents a list of all pages that are available for allocation.
// It also tracks pages that have been freed but are still in use by open transactions.
type freelist struct {
ids []pgid // all free and available free page ids.
pending map[txid][]pgid // mapping of soon-to-be free page ids by tx.
cache map[pgid]bool // fast lookup of all free and pending page ids.
}
// newFreelist returns an empty, initialized freelist.
func newFreelist() *freelist {
return &freelist{
pending: make(map[txid][]pgid),
cache: make(map[pgid]bool),
}
}
// size returns the size of the page after serialization.
func (f *freelist) size() int {
return pageHeaderSize + (int(unsafe.Sizeof(pgid(0))) * f.count())
}
// count returns count of pages on the freelist
func (f *freelist) count() int {
return f.free_count() + f.pending_count()
}
// free_count returns count of free pages
func (f *freelist) free_count() int {
return len(f.ids)
}
// pending_count returns count of pending pages
func (f *freelist) pending_count() int {
var count int
for _, list := range f.pending {
count += len(list)
}
return count
}
// all returns a list of all free ids and all pending ids in one sorted list.
func (f *freelist) all() []pgid {
m := make(pgids, 0)
for _, list := range f.pending {
m = append(m, list...)
}
sort.Sort(m)
return pgids(f.ids).merge(m)
}
// allocate returns the starting page id of a contiguous list of pages of a given size.
// If a contiguous block cannot be found then 0 is returned.
func (f *freelist) allocate(n int) pgid {
if len(f.ids) == 0 {
return 0
}
var initial, previd pgid
for i, id := range f.ids {
if id <= 1 {
panic(fmt.Sprintf("invalid page allocation: %d", id))
}
// Reset initial page if this is not contiguous.
if previd == 0 || id-previd != 1 {
initial = id
}
// If we found a contiguous block then remove it and return it.
if (id-initial)+1 == pgid(n) {
// If we're allocating off the beginning then take the fast path
// and just adjust the existing slice. This will use extra memory
// temporarily but the append() in free() will realloc the slice
// as is necessary.
if (i + 1) == n {
f.ids = f.ids[i+1:]
} else {
copy(f.ids[i-n+1:], f.ids[i+1:])
f.ids = f.ids[:len(f.ids)-n]
}
// Remove from the free cache.
for i := pgid(0); i < pgid(n); i++ {
delete(f.cache, initial+i)
}
return initial
}
previd = id
}
return 0
}
// free releases a page and its overflow for a given transaction id.
// If the page is already free then a panic will occur.
func (f *freelist) free(txid txid, p *page) {
if p.id <= 1 {
panic(fmt.Sprintf("cannot free page 0 or 1: %d", p.id))
}
// Free page and all its overflow pages.
var ids = f.pending[txid]
for id := p.id; id <= p.id+pgid(p.overflow); id++ {
// Verify that page is not already free.
if f.cache[id] {
panic(fmt.Sprintf("page %d already freed", id))
}
// Add to the freelist and cache.
ids = append(ids, id)
f.cache[id] = true
}
f.pending[txid] = ids
}
// release moves all page ids for a transaction id (or older) to the freelist.
func (f *freelist) release(txid txid) {
m := make(pgids, 0)
for tid, ids := range f.pending {
if tid <= txid {
// Move transaction's pending pages to the available freelist.
// Don't remove from the cache since the page is still free.
m = append(m, ids...)
delete(f.pending, tid)
}
}
sort.Sort(m)
f.ids = pgids(f.ids).merge(m)
}
// rollback removes the pages from a given pending tx.
func (f *freelist) rollback(txid txid) {
// Remove page ids from cache.
for _, id := range f.pending[txid] {
delete(f.cache, id)
}
// Remove pages from pending list.
delete(f.pending, txid)
}
// freed returns whether a given page is in the free list.
func (f *freelist) freed(pgid pgid) bool {
return f.cache[pgid]
}
// read initializes the freelist from a freelist page.
func (f *freelist) read(p *page) {
// If the page.count is at the max uint16 value (64k) then it's considered
// an overflow and the size of the freelist is stored as the first element.
idx, count := 0, int(p.count)
if count == 0xFFFF {
idx = 1
count = int(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[0])
}
// Copy the list of page ids from the freelist.
ids := ((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[idx:count]
f.ids = make([]pgid, len(ids))
copy(f.ids, ids)
// Make sure they're sorted.
sort.Sort(pgids(f.ids))
// Rebuild the page cache.
f.reindex()
}
// write writes the page ids onto a freelist page. All free and pending ids are
// saved to disk since in the event of a program crash, all pending ids will
// become free.
func (f *freelist) write(p *page) error {
// Combine the old free pgids and pgids waiting on an open transaction.
ids := f.all()
// Update the header flag.
p.flags |= freelistPageFlag
// The page.count can only hold up to 64k elements so if we overflow that
// number then we handle it by putting the size in the first element.
if len(ids) < 0xFFFF {
p.count = uint16(len(ids))
copy(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[:], ids)
} else {
p.count = 0xFFFF
((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[0] = pgid(len(ids))
copy(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[1:], ids)
}
return nil
}
// reload reads the freelist from a page and filters out pending items.
func (f *freelist) reload(p *page) {
f.read(p)
// Build a cache of only pending pages.
pcache := make(map[pgid]bool)
for _, pendingIDs := range f.pending {
for _, pendingID := range pendingIDs {
pcache[pendingID] = true
}
}
// Check each page in the freelist and build a new available freelist
// with any pages not in the pending lists.
var a []pgid
for _, id := range f.ids {
if !pcache[id] {
a = append(a, id)
}
}
f.ids = a
// Once the available list is rebuilt then rebuild the free cache so that
// it includes the available and pending free pages.
f.reindex()
}
// reindex rebuilds the free cache based on available and pending free lists.
func (f *freelist) reindex() {
f.cache = make(map[pgid]bool)
for _, id := range f.ids {
f.cache[id] = true
}
for _, pendingIDs := range f.pending {
for _, pendingID := range pendingIDs {
f.cache[pendingID] = true
}
}
}

View file

@ -0,0 +1,599 @@
package bolt
import (
"bytes"
"fmt"
"sort"
"unsafe"
)
// node represents an in-memory, deserialized page.
type node struct {
bucket *Bucket
isLeaf bool
unbalanced bool
spilled bool
key []byte
pgid pgid
parent *node
children nodes
inodes inodes
}
// root returns the top-level node this node is attached to.
func (n *node) root() *node {
if n.parent == nil {
return n
}
return n.parent.root()
}
// minKeys returns the minimum number of inodes this node should have.
func (n *node) minKeys() int {
if n.isLeaf {
return 1
}
return 2
}
// size returns the size of the node after serialization.
func (n *node) size() int {
sz, elsz := pageHeaderSize, n.pageElementSize()
for i := 0; i < len(n.inodes); i++ {
item := &n.inodes[i]
sz += elsz + len(item.key) + len(item.value)
}
return sz
}
// sizeLessThan returns true if the node is less than a given size.
// This is an optimization to avoid calculating a large node when we only need
// to know if it fits inside a certain page size.
func (n *node) sizeLessThan(v int) bool {
sz, elsz := pageHeaderSize, n.pageElementSize()
for i := 0; i < len(n.inodes); i++ {
item := &n.inodes[i]
sz += elsz + len(item.key) + len(item.value)
if sz >= v {
return false
}
}
return true
}
// pageElementSize returns the size of each page element based on the type of node.
func (n *node) pageElementSize() int {
if n.isLeaf {
return leafPageElementSize
}
return branchPageElementSize
}
// childAt returns the child node at a given index.
func (n *node) childAt(index int) *node {
if n.isLeaf {
panic(fmt.Sprintf("invalid childAt(%d) on a leaf node", index))
}
return n.bucket.node(n.inodes[index].pgid, n)
}
// childIndex returns the index of a given child node.
func (n *node) childIndex(child *node) int {
index := sort.Search(len(n.inodes), func(i int) bool { return bytes.Compare(n.inodes[i].key, child.key) != -1 })
return index
}
// numChildren returns the number of children.
func (n *node) numChildren() int {
return len(n.inodes)
}
// nextSibling returns the next node with the same parent.
func (n *node) nextSibling() *node {
if n.parent == nil {
return nil
}
index := n.parent.childIndex(n)
if index >= n.parent.numChildren()-1 {
return nil
}
return n.parent.childAt(index + 1)
}
// prevSibling returns the previous node with the same parent.
func (n *node) prevSibling() *node {
if n.parent == nil {
return nil
}
index := n.parent.childIndex(n)
if index == 0 {
return nil
}
return n.parent.childAt(index - 1)
}
// put inserts a key/value.
func (n *node) put(oldKey, newKey, value []byte, pgid pgid, flags uint32) {
if pgid >= n.bucket.tx.meta.pgid {
panic(fmt.Sprintf("pgid (%d) above high water mark (%d)", pgid, n.bucket.tx.meta.pgid))
} else if len(oldKey) <= 0 {
panic("put: zero-length old key")
} else if len(newKey) <= 0 {
panic("put: zero-length new key")
}
// Find insertion index.
index := sort.Search(len(n.inodes), func(i int) bool { return bytes.Compare(n.inodes[i].key, oldKey) != -1 })
// Add capacity and shift nodes if we don't have an exact match and need to insert.
exact := (len(n.inodes) > 0 && index < len(n.inodes) && bytes.Equal(n.inodes[index].key, oldKey))
if !exact {
n.inodes = append(n.inodes, inode{})
copy(n.inodes[index+1:], n.inodes[index:])
}
inode := &n.inodes[index]
inode.flags = flags
inode.key = newKey
inode.value = value
inode.pgid = pgid
_assert(len(inode.key) > 0, "put: zero-length inode key")
}
// del removes a key from the node.
func (n *node) del(key []byte) {
// Find index of key.
index := sort.Search(len(n.inodes), func(i int) bool { return bytes.Compare(n.inodes[i].key, key) != -1 })
// Exit if the key isn't found.
if index >= len(n.inodes) || !bytes.Equal(n.inodes[index].key, key) {
return
}
// Delete inode from the node.
n.inodes = append(n.inodes[:index], n.inodes[index+1:]...)
// Mark the node as needing rebalancing.
n.unbalanced = true
}
// read initializes the node from a page.
func (n *node) read(p *page) {
n.pgid = p.id
n.isLeaf = ((p.flags & leafPageFlag) != 0)
n.inodes = make(inodes, int(p.count))
for i := 0; i < int(p.count); i++ {
inode := &n.inodes[i]
if n.isLeaf {
elem := p.leafPageElement(uint16(i))
inode.flags = elem.flags
inode.key = elem.key()
inode.value = elem.value()
} else {
elem := p.branchPageElement(uint16(i))
inode.pgid = elem.pgid
inode.key = elem.key()
}
_assert(len(inode.key) > 0, "read: zero-length inode key")
}
// Save first key so we can find the node in the parent when we spill.
if len(n.inodes) > 0 {
n.key = n.inodes[0].key
_assert(len(n.key) > 0, "read: zero-length node key")
} else {
n.key = nil
}
}
// write writes the items onto one or more pages.
func (n *node) write(p *page) {
// Initialize page.
if n.isLeaf {
p.flags |= leafPageFlag
} else {
p.flags |= branchPageFlag
}
if len(n.inodes) >= 0xFFFF {
panic(fmt.Sprintf("inode overflow: %d (pgid=%d)", len(n.inodes), p.id))
}
p.count = uint16(len(n.inodes))
// Loop over each item and write it to the page.
b := (*[maxAllocSize]byte)(unsafe.Pointer(&p.ptr))[n.pageElementSize()*len(n.inodes):]
for i, item := range n.inodes {
_assert(len(item.key) > 0, "write: zero-length inode key")
// Write the page element.
if n.isLeaf {
elem := p.leafPageElement(uint16(i))
elem.pos = uint32(uintptr(unsafe.Pointer(&b[0])) - uintptr(unsafe.Pointer(elem)))
elem.flags = item.flags
elem.ksize = uint32(len(item.key))
elem.vsize = uint32(len(item.value))
} else {
elem := p.branchPageElement(uint16(i))
elem.pos = uint32(uintptr(unsafe.Pointer(&b[0])) - uintptr(unsafe.Pointer(elem)))
elem.ksize = uint32(len(item.key))
elem.pgid = item.pgid
_assert(elem.pgid != p.id, "write: circular dependency occurred")
}
// If the length of key+value is larger than the max allocation size
// then we need to reallocate the byte array pointer.
//
// See: https://github.com/boltdb/bolt/pull/335
klen, vlen := len(item.key), len(item.value)
if len(b) < klen+vlen {
b = (*[maxAllocSize]byte)(unsafe.Pointer(&b[0]))[:]
}
// Write data for the element to the end of the page.
copy(b[0:], item.key)
b = b[klen:]
copy(b[0:], item.value)
b = b[vlen:]
}
// DEBUG ONLY: n.dump()
}
// split breaks up a node into multiple smaller nodes, if appropriate.
// This should only be called from the spill() function.
func (n *node) split(pageSize int) []*node {
var nodes []*node
node := n
for {
// Split node into two.
a, b := node.splitTwo(pageSize)
nodes = append(nodes, a)
// If we can't split then exit the loop.
if b == nil {
break
}
// Set node to b so it gets split on the next iteration.
node = b
}
return nodes
}
// splitTwo breaks up a node into two smaller nodes, if appropriate.
// This should only be called from the split() function.
func (n *node) splitTwo(pageSize int) (*node, *node) {
// Ignore the split if the page doesn't have at least enough nodes for
// two pages or if the nodes can fit in a single page.
if len(n.inodes) <= (minKeysPerPage*2) || n.sizeLessThan(pageSize) {
return n, nil
}
// Determine the threshold before starting a new node.
var fillPercent = n.bucket.FillPercent
if fillPercent < minFillPercent {
fillPercent = minFillPercent
} else if fillPercent > maxFillPercent {
fillPercent = maxFillPercent
}
threshold := int(float64(pageSize) * fillPercent)
// Determine split position and sizes of the two pages.
splitIndex, _ := n.splitIndex(threshold)
// Split node into two separate nodes.
// If there's no parent then we'll need to create one.
if n.parent == nil {
n.parent = &node{bucket: n.bucket, children: []*node{n}}
}
// Create a new node and add it to the parent.
next := &node{bucket: n.bucket, isLeaf: n.isLeaf, parent: n.parent}
n.parent.children = append(n.parent.children, next)
// Split inodes across two nodes.
next.inodes = n.inodes[splitIndex:]
n.inodes = n.inodes[:splitIndex]
// Update the statistics.
n.bucket.tx.stats.Split++
return n, next
}
// splitIndex finds the position where a page will fill a given threshold.
// It returns the index as well as the size of the first page.
// This is only be called from split().
func (n *node) splitIndex(threshold int) (index, sz int) {
sz = pageHeaderSize
// Loop until we only have the minimum number of keys required for the second page.
for i := 0; i < len(n.inodes)-minKeysPerPage; i++ {
index = i
inode := n.inodes[i]
elsize := n.pageElementSize() + len(inode.key) + len(inode.value)
// If we have at least the minimum number of keys and adding another
// node would put us over the threshold then exit and return.
if i >= minKeysPerPage && sz+elsize > threshold {
break
}
// Add the element size to the total size.
sz += elsize
}
return
}
// spill writes the nodes to dirty pages and splits nodes as it goes.
// Returns an error if dirty pages cannot be allocated.
func (n *node) spill() error {
var tx = n.bucket.tx
if n.spilled {
return nil
}
// Spill child nodes first. Child nodes can materialize sibling nodes in
// the case of split-merge so we cannot use a range loop. We have to check
// the children size on every loop iteration.
sort.Sort(n.children)
for i := 0; i < len(n.children); i++ {
if err := n.children[i].spill(); err != nil {
return err
}
}
// We no longer need the child list because it's only used for spill tracking.
n.children = nil
// Split nodes into appropriate sizes. The first node will always be n.
var nodes = n.split(tx.db.pageSize)
for _, node := range nodes {
// Add node's page to the freelist if it's not new.
if node.pgid > 0 {
tx.db.freelist.free(tx.meta.txid, tx.page(node.pgid))
node.pgid = 0
}
// Allocate contiguous space for the node.
p, err := tx.allocate((node.size() / tx.db.pageSize) + 1)
if err != nil {
return err
}
// Write the node.
if p.id >= tx.meta.pgid {
panic(fmt.Sprintf("pgid (%d) above high water mark (%d)", p.id, tx.meta.pgid))
}
node.pgid = p.id
node.write(p)
node.spilled = true
// Insert into parent inodes.
if node.parent != nil {
var key = node.key
if key == nil {
key = node.inodes[0].key
}
node.parent.put(key, node.inodes[0].key, nil, node.pgid, 0)
node.key = node.inodes[0].key
_assert(len(node.key) > 0, "spill: zero-length node key")
}
// Update the statistics.
tx.stats.Spill++
}
// If the root node split and created a new root then we need to spill that
// as well. We'll clear out the children to make sure it doesn't try to respill.
if n.parent != nil && n.parent.pgid == 0 {
n.children = nil
return n.parent.spill()
}
return nil
}
// rebalance attempts to combine the node with sibling nodes if the node fill
// size is below a threshold or if there are not enough keys.
func (n *node) rebalance() {
if !n.unbalanced {
return
}
n.unbalanced = false
// Update statistics.
n.bucket.tx.stats.Rebalance++
// Ignore if node is above threshold (25%) and has enough keys.
var threshold = n.bucket.tx.db.pageSize / 4
if n.size() > threshold && len(n.inodes) > n.minKeys() {
return
}
// Root node has special handling.
if n.parent == nil {
// If root node is a branch and only has one node then collapse it.
if !n.isLeaf && len(n.inodes) == 1 {
// Move root's child up.
child := n.bucket.node(n.inodes[0].pgid, n)
n.isLeaf = child.isLeaf
n.inodes = child.inodes[:]
n.children = child.children
// Reparent all child nodes being moved.
for _, inode := range n.inodes {
if child, ok := n.bucket.nodes[inode.pgid]; ok {
child.parent = n
}
}
// Remove old child.
child.parent = nil
delete(n.bucket.nodes, child.pgid)
child.free()
}
return
}
// If node has no keys then just remove it.
if n.numChildren() == 0 {
n.parent.del(n.key)
n.parent.removeChild(n)
delete(n.bucket.nodes, n.pgid)
n.free()
n.parent.rebalance()
return
}
_assert(n.parent.numChildren() > 1, "parent must have at least 2 children")
// Destination node is right sibling if idx == 0, otherwise left sibling.
var target *node
var useNextSibling = (n.parent.childIndex(n) == 0)
if useNextSibling {
target = n.nextSibling()
} else {
target = n.prevSibling()
}
// If both this node and the target node are too small then merge them.
if useNextSibling {
// Reparent all child nodes being moved.
for _, inode := range target.inodes {
if child, ok := n.bucket.nodes[inode.pgid]; ok {
child.parent.removeChild(child)
child.parent = n
child.parent.children = append(child.parent.children, child)
}
}
// Copy over inodes from target and remove target.
n.inodes = append(n.inodes, target.inodes...)
n.parent.del(target.key)
n.parent.removeChild(target)
delete(n.bucket.nodes, target.pgid)
target.free()
} else {
// Reparent all child nodes being moved.
for _, inode := range n.inodes {
if child, ok := n.bucket.nodes[inode.pgid]; ok {
child.parent.removeChild(child)
child.parent = target
child.parent.children = append(child.parent.children, child)
}
}
// Copy over inodes to target and remove node.
target.inodes = append(target.inodes, n.inodes...)
n.parent.del(n.key)
n.parent.removeChild(n)
delete(n.bucket.nodes, n.pgid)
n.free()
}
// Either this node or the target node was deleted from the parent so rebalance it.
n.parent.rebalance()
}
// removes a node from the list of in-memory children.
// This does not affect the inodes.
func (n *node) removeChild(target *node) {
for i, child := range n.children {
if child == target {
n.children = append(n.children[:i], n.children[i+1:]...)
return
}
}
}
// dereference causes the node to copy all its inode key/value references to heap memory.
// This is required when the mmap is reallocated so inodes are not pointing to stale data.
func (n *node) dereference() {
if n.key != nil {
key := make([]byte, len(n.key))
copy(key, n.key)
n.key = key
_assert(n.pgid == 0 || len(n.key) > 0, "dereference: zero-length node key on existing node")
}
for i := range n.inodes {
inode := &n.inodes[i]
key := make([]byte, len(inode.key))
copy(key, inode.key)
inode.key = key
_assert(len(inode.key) > 0, "dereference: zero-length inode key")
value := make([]byte, len(inode.value))
copy(value, inode.value)
inode.value = value
}
// Recursively dereference children.
for _, child := range n.children {
child.dereference()
}
// Update statistics.
n.bucket.tx.stats.NodeDeref++
}
// free adds the node's underlying page to the freelist.
func (n *node) free() {
if n.pgid != 0 {
n.bucket.tx.db.freelist.free(n.bucket.tx.meta.txid, n.bucket.tx.page(n.pgid))
n.pgid = 0
}
}
// dump writes the contents of the node to STDERR for debugging purposes.
/*
func (n *node) dump() {
// Write node header.
var typ = "branch"
if n.isLeaf {
typ = "leaf"
}
warnf("[NODE %d {type=%s count=%d}]", n.pgid, typ, len(n.inodes))
// Write out abbreviated version of each item.
for _, item := range n.inodes {
if n.isLeaf {
if item.flags&bucketLeafFlag != 0 {
bucket := (*bucket)(unsafe.Pointer(&item.value[0]))
warnf("+L %08x -> (bucket root=%d)", trunc(item.key, 4), bucket.root)
} else {
warnf("+L %08x -> %08x", trunc(item.key, 4), trunc(item.value, 4))
}
} else {
warnf("+B %08x -> pgid=%d", trunc(item.key, 4), item.pgid)
}
}
warn("")
}
*/
type nodes []*node
func (s nodes) Len() int { return len(s) }
func (s nodes) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func (s nodes) Less(i, j int) bool { return bytes.Compare(s[i].inodes[0].key, s[j].inodes[0].key) == -1 }
// inode represents an internal node inside of a node.
// It can be used to point to elements in a page or point
// to an element which hasn't been added to a page yet.
type inode struct {
flags uint32
pgid pgid
key []byte
value []byte
}
type inodes []inode

View file

@ -0,0 +1,172 @@
package bolt
import (
"fmt"
"os"
"sort"
"unsafe"
)
const pageHeaderSize = int(unsafe.Offsetof(((*page)(nil)).ptr))
const minKeysPerPage = 2
const branchPageElementSize = int(unsafe.Sizeof(branchPageElement{}))
const leafPageElementSize = int(unsafe.Sizeof(leafPageElement{}))
const (
branchPageFlag = 0x01
leafPageFlag = 0x02
metaPageFlag = 0x04
freelistPageFlag = 0x10
)
const (
bucketLeafFlag = 0x01
)
type pgid uint64
type page struct {
id pgid
flags uint16
count uint16
overflow uint32
ptr uintptr
}
// typ returns a human readable page type string used for debugging.
func (p *page) typ() string {
if (p.flags & branchPageFlag) != 0 {
return "branch"
} else if (p.flags & leafPageFlag) != 0 {
return "leaf"
} else if (p.flags & metaPageFlag) != 0 {
return "meta"
} else if (p.flags & freelistPageFlag) != 0 {
return "freelist"
}
return fmt.Sprintf("unknown<%02x>", p.flags)
}
// meta returns a pointer to the metadata section of the page.
func (p *page) meta() *meta {
return (*meta)(unsafe.Pointer(&p.ptr))
}
// leafPageElement retrieves the leaf node by index
func (p *page) leafPageElement(index uint16) *leafPageElement {
n := &((*[0x7FFFFFF]leafPageElement)(unsafe.Pointer(&p.ptr)))[index]
return n
}
// leafPageElements retrieves a list of leaf nodes.
func (p *page) leafPageElements() []leafPageElement {
return ((*[0x7FFFFFF]leafPageElement)(unsafe.Pointer(&p.ptr)))[:]
}
// branchPageElement retrieves the branch node by index
func (p *page) branchPageElement(index uint16) *branchPageElement {
return &((*[0x7FFFFFF]branchPageElement)(unsafe.Pointer(&p.ptr)))[index]
}
// branchPageElements retrieves a list of branch nodes.
func (p *page) branchPageElements() []branchPageElement {
return ((*[0x7FFFFFF]branchPageElement)(unsafe.Pointer(&p.ptr)))[:]
}
// dump writes n bytes of the page to STDERR as hex output.
func (p *page) hexdump(n int) {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(p))[:n]
fmt.Fprintf(os.Stderr, "%x\n", buf)
}
type pages []*page
func (s pages) Len() int { return len(s) }
func (s pages) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func (s pages) Less(i, j int) bool { return s[i].id < s[j].id }
// branchPageElement represents a node on a branch page.
type branchPageElement struct {
pos uint32
ksize uint32
pgid pgid
}
// key returns a byte slice of the node key.
func (n *branchPageElement) key() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize]
}
// leafPageElement represents a node on a leaf page.
type leafPageElement struct {
flags uint32
pos uint32
ksize uint32
vsize uint32
}
// key returns a byte slice of the node key.
func (n *leafPageElement) key() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize:n.ksize]
}
// value returns a byte slice of the node value.
func (n *leafPageElement) value() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos+n.ksize]))[:n.vsize:n.vsize]
}
// PageInfo represents human readable information about a page.
type PageInfo struct {
ID int
Type string
Count int
OverflowCount int
}
type pgids []pgid
func (s pgids) Len() int { return len(s) }
func (s pgids) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func (s pgids) Less(i, j int) bool { return s[i] < s[j] }
// merge returns the sorted union of a and b.
func (a pgids) merge(b pgids) pgids {
// Return the opposite slice if one is nil.
if len(a) == 0 {
return b
} else if len(b) == 0 {
return a
}
// Create a list to hold all elements from both lists.
merged := make(pgids, 0, len(a)+len(b))
// Assign lead to the slice with a lower starting value, follow to the higher value.
lead, follow := a, b
if b[0] < a[0] {
lead, follow = b, a
}
// Continue while there are elements in the lead.
for len(lead) > 0 {
// Merge largest prefix of lead that is ahead of follow[0].
n := sort.Search(len(lead), func(i int) bool { return lead[i] > follow[0] })
merged = append(merged, lead[:n]...)
if n >= len(lead) {
break
}
// Swap lead and follow.
lead, follow = follow, lead[n:]
}
// Append what's left in follow.
merged = append(merged, follow...)
return merged
}

View file

@ -0,0 +1,682 @@
package bolt
import (
"fmt"
"io"
"os"
"sort"
"strings"
"time"
"unsafe"
)
// txid represents the internal transaction identifier.
type txid uint64
// Tx represents a read-only or read/write transaction on the database.
// Read-only transactions can be used for retrieving values for keys and creating cursors.
// Read/write transactions can create and remove buckets and create and remove keys.
//
// IMPORTANT: You must commit or rollback transactions when you are done with
// them. Pages can not be reclaimed by the writer until no more transactions
// are using them. A long running read transaction can cause the database to
// quickly grow.
type Tx struct {
writable bool
managed bool
db *DB
meta *meta
root Bucket
pages map[pgid]*page
stats TxStats
commitHandlers []func()
// WriteFlag specifies the flag for write-related methods like WriteTo().
// Tx opens the database file with the specified flag to copy the data.
//
// By default, the flag is unset, which works well for mostly in-memory
// workloads. For databases that are much larger than available RAM,
// set the flag to syscall.O_DIRECT to avoid trashing the page cache.
WriteFlag int
}
// init initializes the transaction.
func (tx *Tx) init(db *DB) {
tx.db = db
tx.pages = nil
// Copy the meta page since it can be changed by the writer.
tx.meta = &meta{}
db.meta().copy(tx.meta)
// Copy over the root bucket.
tx.root = newBucket(tx)
tx.root.bucket = &bucket{}
*tx.root.bucket = tx.meta.root
// Increment the transaction id and add a page cache for writable transactions.
if tx.writable {
tx.pages = make(map[pgid]*page)
tx.meta.txid += txid(1)
}
}
// ID returns the transaction id.
func (tx *Tx) ID() int {
return int(tx.meta.txid)
}
// DB returns a reference to the database that created the transaction.
func (tx *Tx) DB() *DB {
return tx.db
}
// Size returns current database size in bytes as seen by this transaction.
func (tx *Tx) Size() int64 {
return int64(tx.meta.pgid) * int64(tx.db.pageSize)
}
// Writable returns whether the transaction can perform write operations.
func (tx *Tx) Writable() bool {
return tx.writable
}
// Cursor creates a cursor associated with the root bucket.
// All items in the cursor will return a nil value because all root bucket keys point to buckets.
// The cursor is only valid as long as the transaction is open.
// Do not use a cursor after the transaction is closed.
func (tx *Tx) Cursor() *Cursor {
return tx.root.Cursor()
}
// Stats retrieves a copy of the current transaction statistics.
func (tx *Tx) Stats() TxStats {
return tx.stats
}
// Bucket retrieves a bucket by name.
// Returns nil if the bucket does not exist.
// The bucket instance is only valid for the lifetime of the transaction.
func (tx *Tx) Bucket(name []byte) *Bucket {
return tx.root.Bucket(name)
}
// CreateBucket creates a new bucket.
// Returns an error if the bucket already exists, if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (tx *Tx) CreateBucket(name []byte) (*Bucket, error) {
return tx.root.CreateBucket(name)
}
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist.
// Returns an error if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (tx *Tx) CreateBucketIfNotExists(name []byte) (*Bucket, error) {
return tx.root.CreateBucketIfNotExists(name)
}
// DeleteBucket deletes a bucket.
// Returns an error if the bucket cannot be found or if the key represents a non-bucket value.
func (tx *Tx) DeleteBucket(name []byte) error {
return tx.root.DeleteBucket(name)
}
// ForEach executes a function for each bucket in the root.
// If the provided function returns an error then the iteration is stopped and
// the error is returned to the caller.
func (tx *Tx) ForEach(fn func(name []byte, b *Bucket) error) error {
return tx.root.ForEach(func(k, v []byte) error {
if err := fn(k, tx.root.Bucket(k)); err != nil {
return err
}
return nil
})
}
// OnCommit adds a handler function to be executed after the transaction successfully commits.
func (tx *Tx) OnCommit(fn func()) {
tx.commitHandlers = append(tx.commitHandlers, fn)
}
// Commit writes all changes to disk and updates the meta page.
// Returns an error if a disk write error occurs, or if Commit is
// called on a read-only transaction.
func (tx *Tx) Commit() error {
_assert(!tx.managed, "managed tx commit not allowed")
if tx.db == nil {
return ErrTxClosed
} else if !tx.writable {
return ErrTxNotWritable
}
// TODO(benbjohnson): Use vectorized I/O to write out dirty pages.
// Rebalance nodes which have had deletions.
var startTime = time.Now()
tx.root.rebalance()
if tx.stats.Rebalance > 0 {
tx.stats.RebalanceTime += time.Since(startTime)
}
// spill data onto dirty pages.
startTime = time.Now()
if err := tx.root.spill(); err != nil {
tx.rollback()
return err
}
tx.stats.SpillTime += time.Since(startTime)
// Free the old root bucket.
tx.meta.root.root = tx.root.root
opgid := tx.meta.pgid
// Free the freelist and allocate new pages for it. This will overestimate
// the size of the freelist but not underestimate the size (which would be bad).
tx.db.freelist.free(tx.meta.txid, tx.db.page(tx.meta.freelist))
p, err := tx.allocate((tx.db.freelist.size() / tx.db.pageSize) + 1)
if err != nil {
tx.rollback()
return err
}
if err := tx.db.freelist.write(p); err != nil {
tx.rollback()
return err
}
tx.meta.freelist = p.id
// If the high water mark has moved up then attempt to grow the database.
if tx.meta.pgid > opgid {
if err := tx.db.grow(int(tx.meta.pgid+1) * tx.db.pageSize); err != nil {
tx.rollback()
return err
}
}
// Write dirty pages to disk.
startTime = time.Now()
if err := tx.write(); err != nil {
tx.rollback()
return err
}
// If strict mode is enabled then perform a consistency check.
// Only the first consistency error is reported in the panic.
if tx.db.StrictMode {
ch := tx.Check()
var errs []string
for {
err, ok := <-ch
if !ok {
break
}
errs = append(errs, err.Error())
}
if len(errs) > 0 {
panic("check fail: " + strings.Join(errs, "\n"))
}
}
// Write meta to disk.
if err := tx.writeMeta(); err != nil {
tx.rollback()
return err
}
tx.stats.WriteTime += time.Since(startTime)
// Finalize the transaction.
tx.close()
// Execute commit handlers now that the locks have been removed.
for _, fn := range tx.commitHandlers {
fn()
}
return nil
}
// Rollback closes the transaction and ignores all previous updates. Read-only
// transactions must be rolled back and not committed.
func (tx *Tx) Rollback() error {
_assert(!tx.managed, "managed tx rollback not allowed")
if tx.db == nil {
return ErrTxClosed
}
tx.rollback()
return nil
}
func (tx *Tx) rollback() {
if tx.db == nil {
return
}
if tx.writable {
tx.db.freelist.rollback(tx.meta.txid)
tx.db.freelist.reload(tx.db.page(tx.db.meta().freelist))
}
tx.close()
}
func (tx *Tx) close() {
if tx.db == nil {
return
}
if tx.writable {
// Grab freelist stats.
var freelistFreeN = tx.db.freelist.free_count()
var freelistPendingN = tx.db.freelist.pending_count()
var freelistAlloc = tx.db.freelist.size()
// Remove transaction ref & writer lock.
tx.db.rwtx = nil
tx.db.rwlock.Unlock()
// Merge statistics.
tx.db.statlock.Lock()
tx.db.stats.FreePageN = freelistFreeN
tx.db.stats.PendingPageN = freelistPendingN
tx.db.stats.FreeAlloc = (freelistFreeN + freelistPendingN) * tx.db.pageSize
tx.db.stats.FreelistInuse = freelistAlloc
tx.db.stats.TxStats.add(&tx.stats)
tx.db.statlock.Unlock()
} else {
tx.db.removeTx(tx)
}
// Clear all references.
tx.db = nil
tx.meta = nil
tx.root = Bucket{tx: tx}
tx.pages = nil
}
// Copy writes the entire database to a writer.
// This function exists for backwards compatibility. Use WriteTo() instead.
func (tx *Tx) Copy(w io.Writer) error {
_, err := tx.WriteTo(w)
return err
}
// WriteTo writes the entire database to a writer.
// If err == nil then exactly tx.Size() bytes will be written into the writer.
func (tx *Tx) WriteTo(w io.Writer) (n int64, err error) {
// Attempt to open reader with WriteFlag
f, err := os.OpenFile(tx.db.path, os.O_RDONLY|tx.WriteFlag, 0)
if err != nil {
return 0, err
}
defer func() { _ = f.Close() }()
// Generate a meta page. We use the same page data for both meta pages.
buf := make([]byte, tx.db.pageSize)
page := (*page)(unsafe.Pointer(&buf[0]))
page.flags = metaPageFlag
*page.meta() = *tx.meta
// Write meta 0.
page.id = 0
page.meta().checksum = page.meta().sum64()
nn, err := w.Write(buf)
n += int64(nn)
if err != nil {
return n, fmt.Errorf("meta 0 copy: %s", err)
}
// Write meta 1 with a lower transaction id.
page.id = 1
page.meta().txid -= 1
page.meta().checksum = page.meta().sum64()
nn, err = w.Write(buf)
n += int64(nn)
if err != nil {
return n, fmt.Errorf("meta 1 copy: %s", err)
}
// Move past the meta pages in the file.
if _, err := f.Seek(int64(tx.db.pageSize*2), os.SEEK_SET); err != nil {
return n, fmt.Errorf("seek: %s", err)
}
// Copy data pages.
wn, err := io.CopyN(w, f, tx.Size()-int64(tx.db.pageSize*2))
n += wn
if err != nil {
return n, err
}
return n, f.Close()
}
// CopyFile copies the entire database to file at the given path.
// A reader transaction is maintained during the copy so it is safe to continue
// using the database while a copy is in progress.
func (tx *Tx) CopyFile(path string, mode os.FileMode) error {
f, err := os.OpenFile(path, os.O_RDWR|os.O_CREATE|os.O_TRUNC, mode)
if err != nil {
return err
}
err = tx.Copy(f)
if err != nil {
_ = f.Close()
return err
}
return f.Close()
}
// Check performs several consistency checks on the database for this transaction.
// An error is returned if any inconsistency is found.
//
// It can be safely run concurrently on a writable transaction. However, this
// incurs a high cost for large databases and databases with a lot of subbuckets
// because of caching. This overhead can be removed if running on a read-only
// transaction, however, it is not safe to execute other writer transactions at
// the same time.
func (tx *Tx) Check() <-chan error {
ch := make(chan error)
go tx.check(ch)
return ch
}
func (tx *Tx) check(ch chan error) {
// Check if any pages are double freed.
freed := make(map[pgid]bool)
for _, id := range tx.db.freelist.all() {
if freed[id] {
ch <- fmt.Errorf("page %d: already freed", id)
}
freed[id] = true
}
// Track every reachable page.
reachable := make(map[pgid]*page)
reachable[0] = tx.page(0) // meta0
reachable[1] = tx.page(1) // meta1
for i := uint32(0); i <= tx.page(tx.meta.freelist).overflow; i++ {
reachable[tx.meta.freelist+pgid(i)] = tx.page(tx.meta.freelist)
}
// Recursively check buckets.
tx.checkBucket(&tx.root, reachable, freed, ch)
// Ensure all pages below high water mark are either reachable or freed.
for i := pgid(0); i < tx.meta.pgid; i++ {
_, isReachable := reachable[i]
if !isReachable && !freed[i] {
ch <- fmt.Errorf("page %d: unreachable unfreed", int(i))
}
}
// Close the channel to signal completion.
close(ch)
}
func (tx *Tx) checkBucket(b *Bucket, reachable map[pgid]*page, freed map[pgid]bool, ch chan error) {
// Ignore inline buckets.
if b.root == 0 {
return
}
// Check every page used by this bucket.
b.tx.forEachPage(b.root, 0, func(p *page, _ int) {
if p.id > tx.meta.pgid {
ch <- fmt.Errorf("page %d: out of bounds: %d", int(p.id), int(b.tx.meta.pgid))
}
// Ensure each page is only referenced once.
for i := pgid(0); i <= pgid(p.overflow); i++ {
var id = p.id + i
if _, ok := reachable[id]; ok {
ch <- fmt.Errorf("page %d: multiple references", int(id))
}
reachable[id] = p
}
// We should only encounter un-freed leaf and branch pages.
if freed[p.id] {
ch <- fmt.Errorf("page %d: reachable freed", int(p.id))
} else if (p.flags&branchPageFlag) == 0 && (p.flags&leafPageFlag) == 0 {
ch <- fmt.Errorf("page %d: invalid type: %s", int(p.id), p.typ())
}
})
// Check each bucket within this bucket.
_ = b.ForEach(func(k, v []byte) error {
if child := b.Bucket(k); child != nil {
tx.checkBucket(child, reachable, freed, ch)
}
return nil
})
}
// allocate returns a contiguous block of memory starting at a given page.
func (tx *Tx) allocate(count int) (*page, error) {
p, err := tx.db.allocate(count)
if err != nil {
return nil, err
}
// Save to our page cache.
tx.pages[p.id] = p
// Update statistics.
tx.stats.PageCount++
tx.stats.PageAlloc += count * tx.db.pageSize
return p, nil
}
// write writes any dirty pages to disk.
func (tx *Tx) write() error {
// Sort pages by id.
pages := make(pages, 0, len(tx.pages))
for _, p := range tx.pages {
pages = append(pages, p)
}
// Clear out page cache early.
tx.pages = make(map[pgid]*page)
sort.Sort(pages)
// Write pages to disk in order.
for _, p := range pages {
size := (int(p.overflow) + 1) * tx.db.pageSize
offset := int64(p.id) * int64(tx.db.pageSize)
// Write out page in "max allocation" sized chunks.
ptr := (*[maxAllocSize]byte)(unsafe.Pointer(p))
for {
// Limit our write to our max allocation size.
sz := size
if sz > maxAllocSize-1 {
sz = maxAllocSize - 1
}
// Write chunk to disk.
buf := ptr[:sz]
if _, err := tx.db.ops.writeAt(buf, offset); err != nil {
return err
}
// Update statistics.
tx.stats.Write++
// Exit inner for loop if we've written all the chunks.
size -= sz
if size == 0 {
break
}
// Otherwise move offset forward and move pointer to next chunk.
offset += int64(sz)
ptr = (*[maxAllocSize]byte)(unsafe.Pointer(&ptr[sz]))
}
}
// Ignore file sync if flag is set on DB.
if !tx.db.NoSync || IgnoreNoSync {
if err := fdatasync(tx.db); err != nil {
return err
}
}
// Put small pages back to page pool.
for _, p := range pages {
// Ignore page sizes over 1 page.
// These are allocated using make() instead of the page pool.
if int(p.overflow) != 0 {
continue
}
buf := (*[maxAllocSize]byte)(unsafe.Pointer(p))[:tx.db.pageSize]
// See https://go.googlesource.com/go/+/f03c9202c43e0abb130669852082117ca50aa9b1
for i := range buf {
buf[i] = 0
}
tx.db.pagePool.Put(buf)
}
return nil
}
// writeMeta writes the meta to the disk.
func (tx *Tx) writeMeta() error {
// Create a temporary buffer for the meta page.
buf := make([]byte, tx.db.pageSize)
p := tx.db.pageInBuffer(buf, 0)
tx.meta.write(p)
// Write the meta page to file.
if _, err := tx.db.ops.writeAt(buf, int64(p.id)*int64(tx.db.pageSize)); err != nil {
return err
}
if !tx.db.NoSync || IgnoreNoSync {
if err := fdatasync(tx.db); err != nil {
return err
}
}
// Update statistics.
tx.stats.Write++
return nil
}
// page returns a reference to the page with a given id.
// If page has been written to then a temporary buffered page is returned.
func (tx *Tx) page(id pgid) *page {
// Check the dirty pages first.
if tx.pages != nil {
if p, ok := tx.pages[id]; ok {
return p
}
}
// Otherwise return directly from the mmap.
return tx.db.page(id)
}
// forEachPage iterates over every page within a given page and executes a function.
func (tx *Tx) forEachPage(pgid pgid, depth int, fn func(*page, int)) {
p := tx.page(pgid)
// Execute function.
fn(p, depth)
// Recursively loop over children.
if (p.flags & branchPageFlag) != 0 {
for i := 0; i < int(p.count); i++ {
elem := p.branchPageElement(uint16(i))
tx.forEachPage(elem.pgid, depth+1, fn)
}
}
}
// Page returns page information for a given page number.
// This is only safe for concurrent use when used by a writable transaction.
func (tx *Tx) Page(id int) (*PageInfo, error) {
if tx.db == nil {
return nil, ErrTxClosed
} else if pgid(id) >= tx.meta.pgid {
return nil, nil
}
// Build the page info.
p := tx.db.page(pgid(id))
info := &PageInfo{
ID: id,
Count: int(p.count),
OverflowCount: int(p.overflow),
}
// Determine the type (or if it's free).
if tx.db.freelist.freed(pgid(id)) {
info.Type = "free"
} else {
info.Type = p.typ()
}
return info, nil
}
// TxStats represents statistics about the actions performed by the transaction.
type TxStats struct {
// Page statistics.
PageCount int // number of page allocations
PageAlloc int // total bytes allocated
// Cursor statistics.
CursorCount int // number of cursors created
// Node statistics
NodeCount int // number of node allocations
NodeDeref int // number of node dereferences
// Rebalance statistics.
Rebalance int // number of node rebalances
RebalanceTime time.Duration // total time spent rebalancing
// Split/Spill statistics.
Split int // number of nodes split
Spill int // number of nodes spilled
SpillTime time.Duration // total time spent spilling
// Write statistics.
Write int // number of writes performed
WriteTime time.Duration // total time spent writing to disk
}
func (s *TxStats) add(other *TxStats) {
s.PageCount += other.PageCount
s.PageAlloc += other.PageAlloc
s.CursorCount += other.CursorCount
s.NodeCount += other.NodeCount
s.NodeDeref += other.NodeDeref
s.Rebalance += other.Rebalance
s.RebalanceTime += other.RebalanceTime
s.Split += other.Split
s.Spill += other.Spill
s.SpillTime += other.SpillTime
s.Write += other.Write
s.WriteTime += other.WriteTime
}
// Sub calculates and returns the difference between two sets of transaction stats.
// This is useful when obtaining stats at two different points and time and
// you need the performance counters that occurred within that time span.
func (s *TxStats) Sub(other *TxStats) TxStats {
var diff TxStats
diff.PageCount = s.PageCount - other.PageCount
diff.PageAlloc = s.PageAlloc - other.PageAlloc
diff.CursorCount = s.CursorCount - other.CursorCount
diff.NodeCount = s.NodeCount - other.NodeCount
diff.NodeDeref = s.NodeDeref - other.NodeDeref
diff.Rebalance = s.Rebalance - other.Rebalance
diff.RebalanceTime = s.RebalanceTime - other.RebalanceTime
diff.Split = s.Split - other.Split
diff.Spill = s.Spill - other.Spill
diff.SpillTime = s.SpillTime - other.SpillTime
diff.Write = s.Write - other.Write
diff.WriteTime = s.WriteTime - other.WriteTime
return diff
}

View file

@ -0,0 +1,191 @@
Apache License
Version 2.0, January 2004
https://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
Copyright 2013-2016 Docker, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,19 @@
Docker
Copyright 2012-2016 Docker, Inc.
This product includes software developed at Docker, Inc. (https://www.docker.com).
This product contains software (https://github.com/kr/pty) developed
by Keith Rarick, licensed under the MIT License.
The following is courtesy of our legal counsel:
Use and transfer of Docker may be subject to certain restrictions by the
United States and other governments.
It is your responsibility to ensure that your use and/or transfer does not
violate applicable laws.
For more information, please see https://www.bis.doc.gov
See also https://www.apache.org/dev/crypto.html and/or seek legal counsel.

View file

@ -0,0 +1,151 @@
package opts
import (
"fmt"
"net"
"net/url"
"strconv"
"strings"
)
var (
// DefaultHTTPPort Default HTTP Port used if only the protocol is provided to -H flag e.g. docker daemon -H tcp://
// These are the IANA registered port numbers for use with Docker
// see http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=docker
DefaultHTTPPort = 2375 // Default HTTP Port
// DefaultTLSHTTPPort Default HTTP Port used when TLS enabled
DefaultTLSHTTPPort = 2376 // Default TLS encrypted HTTP Port
// DefaultUnixSocket Path for the unix socket.
// Docker daemon by default always listens on the default unix socket
DefaultUnixSocket = "/var/run/docker.sock"
// DefaultTCPHost constant defines the default host string used by docker on Windows
DefaultTCPHost = fmt.Sprintf("tcp://%s:%d", DefaultHTTPHost, DefaultHTTPPort)
// DefaultTLSHost constant defines the default host string used by docker for TLS sockets
DefaultTLSHost = fmt.Sprintf("tcp://%s:%d", DefaultHTTPHost, DefaultTLSHTTPPort)
// DefaultNamedPipe defines the default named pipe used by docker on Windows
DefaultNamedPipe = `//./pipe/docker_engine`
)
// ValidateHost validates that the specified string is a valid host and returns it.
func ValidateHost(val string) (string, error) {
host := strings.TrimSpace(val)
// The empty string means default and is not handled by parseDockerDaemonHost
if host != "" {
_, err := parseDockerDaemonHost(host)
if err != nil {
return val, err
}
}
// Note: unlike most flag validators, we don't return the mutated value here
// we need to know what the user entered later (using ParseHost) to adjust for tls
return val, nil
}
// ParseHost and set defaults for a Daemon host string
func ParseHost(defaultToTLS bool, val string) (string, error) {
host := strings.TrimSpace(val)
if host == "" {
if defaultToTLS {
host = DefaultTLSHost
} else {
host = DefaultHost
}
} else {
var err error
host, err = parseDockerDaemonHost(host)
if err != nil {
return val, err
}
}
return host, nil
}
// parseDockerDaemonHost parses the specified address and returns an address that will be used as the host.
// Depending of the address specified, this may return one of the global Default* strings defined in hosts.go.
func parseDockerDaemonHost(addr string) (string, error) {
addrParts := strings.SplitN(addr, "://", 2)
if len(addrParts) == 1 && addrParts[0] != "" {
addrParts = []string{"tcp", addrParts[0]}
}
switch addrParts[0] {
case "tcp":
return ParseTCPAddr(addrParts[1], DefaultTCPHost)
case "unix":
return parseSimpleProtoAddr("unix", addrParts[1], DefaultUnixSocket)
case "npipe":
return parseSimpleProtoAddr("npipe", addrParts[1], DefaultNamedPipe)
case "fd":
return addr, nil
default:
return "", fmt.Errorf("Invalid bind address format: %s", addr)
}
}
// parseSimpleProtoAddr parses and validates that the specified address is a valid
// socket address for simple protocols like unix and npipe. It returns a formatted
// socket address, either using the address parsed from addr, or the contents of
// defaultAddr if addr is a blank string.
func parseSimpleProtoAddr(proto, addr, defaultAddr string) (string, error) {
addr = strings.TrimPrefix(addr, proto+"://")
if strings.Contains(addr, "://") {
return "", fmt.Errorf("Invalid proto, expected %s: %s", proto, addr)
}
if addr == "" {
addr = defaultAddr
}
return fmt.Sprintf("%s://%s", proto, addr), nil
}
// ParseTCPAddr parses and validates that the specified address is a valid TCP
// address. It returns a formatted TCP address, either using the address parsed
// from tryAddr, or the contents of defaultAddr if tryAddr is a blank string.
// tryAddr is expected to have already been Trim()'d
// defaultAddr must be in the full `tcp://host:port` form
func ParseTCPAddr(tryAddr string, defaultAddr string) (string, error) {
if tryAddr == "" || tryAddr == "tcp://" {
return defaultAddr, nil
}
addr := strings.TrimPrefix(tryAddr, "tcp://")
if strings.Contains(addr, "://") || addr == "" {
return "", fmt.Errorf("Invalid proto, expected tcp: %s", tryAddr)
}
defaultAddr = strings.TrimPrefix(defaultAddr, "tcp://")
defaultHost, defaultPort, err := net.SplitHostPort(defaultAddr)
if err != nil {
return "", err
}
// url.Parse fails for trailing colon on IPv6 brackets on Go 1.5, but
// not 1.4. See https://github.com/golang/go/issues/12200 and
// https://github.com/golang/go/issues/6530.
if strings.HasSuffix(addr, "]:") {
addr += defaultPort
}
u, err := url.Parse("tcp://" + addr)
if err != nil {
return "", err
}
host, port, err := net.SplitHostPort(u.Host)
if err != nil {
// try port addition once
host, port, err = net.SplitHostPort(net.JoinHostPort(u.Host, defaultPort))
}
if err != nil {
return "", fmt.Errorf("Invalid bind address format: %s", tryAddr)
}
if host == "" {
host = defaultHost
}
if port == "" {
port = defaultPort
}
p, err := strconv.Atoi(port)
if err != nil && p == 0 {
return "", fmt.Errorf("Invalid bind address format: %s", tryAddr)
}
return fmt.Sprintf("tcp://%s%s", net.JoinHostPort(host, port), u.Path), nil
}

View file

@ -0,0 +1,8 @@
// +build !windows
package opts
import "fmt"
// DefaultHost constant defines the default host string used by docker on other hosts than Windows
var DefaultHost = fmt.Sprintf("unix://%s", DefaultUnixSocket)

View file

@ -0,0 +1,6 @@
// +build windows
package opts
// DefaultHost constant defines the default host string used by docker on Windows
var DefaultHost = "npipe://" + DefaultNamedPipe

View file

@ -0,0 +1,42 @@
package opts
import (
"fmt"
"net"
)
// IPOpt holds an IP. It is used to store values from CLI flags.
type IPOpt struct {
*net.IP
}
// NewIPOpt creates a new IPOpt from a reference net.IP and a
// string representation of an IP. If the string is not a valid
// IP it will fallback to the specified reference.
func NewIPOpt(ref *net.IP, defaultVal string) *IPOpt {
o := &IPOpt{
IP: ref,
}
o.Set(defaultVal)
return o
}
// Set sets an IPv4 or IPv6 address from a given string. If the given
// string is not parseable as an IP address it returns an error.
func (o *IPOpt) Set(val string) error {
ip := net.ParseIP(val)
if ip == nil {
return fmt.Errorf("%s is not an ip address", val)
}
*o.IP = ip
return nil
}
// String returns the IP address stored in the IPOpt. If stored IP is a
// nil pointer, it returns an empty string.
func (o *IPOpt) String() string {
if *o.IP == nil {
return ""
}
return o.IP.String()
}

View file

@ -0,0 +1,321 @@
package opts
import (
"fmt"
"net"
"regexp"
"strings"
"github.com/docker/engine-api/types/filters"
)
var (
alphaRegexp = regexp.MustCompile(`[a-zA-Z]`)
domainRegexp = regexp.MustCompile(`^(:?(:?[a-zA-Z0-9]|(:?[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9]))(:?\.(:?[a-zA-Z0-9]|(:?[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])))*)\.?\s*$`)
)
// ListOpts holds a list of values and a validation function.
type ListOpts struct {
values *[]string
validator ValidatorFctType
}
// NewListOpts creates a new ListOpts with the specified validator.
func NewListOpts(validator ValidatorFctType) ListOpts {
var values []string
return *NewListOptsRef(&values, validator)
}
// NewListOptsRef creates a new ListOpts with the specified values and validator.
func NewListOptsRef(values *[]string, validator ValidatorFctType) *ListOpts {
return &ListOpts{
values: values,
validator: validator,
}
}
func (opts *ListOpts) String() string {
return fmt.Sprintf("%v", []string((*opts.values)))
}
// Set validates if needed the input value and adds it to the
// internal slice.
func (opts *ListOpts) Set(value string) error {
if opts.validator != nil {
v, err := opts.validator(value)
if err != nil {
return err
}
value = v
}
(*opts.values) = append((*opts.values), value)
return nil
}
// Delete removes the specified element from the slice.
func (opts *ListOpts) Delete(key string) {
for i, k := range *opts.values {
if k == key {
(*opts.values) = append((*opts.values)[:i], (*opts.values)[i+1:]...)
return
}
}
}
// GetMap returns the content of values in a map in order to avoid
// duplicates.
func (opts *ListOpts) GetMap() map[string]struct{} {
ret := make(map[string]struct{})
for _, k := range *opts.values {
ret[k] = struct{}{}
}
return ret
}
// GetAll returns the values of slice.
func (opts *ListOpts) GetAll() []string {
return (*opts.values)
}
// GetAllOrEmpty returns the values of the slice
// or an empty slice when there are no values.
func (opts *ListOpts) GetAllOrEmpty() []string {
v := *opts.values
if v == nil {
return make([]string, 0)
}
return v
}
// Get checks the existence of the specified key.
func (opts *ListOpts) Get(key string) bool {
for _, k := range *opts.values {
if k == key {
return true
}
}
return false
}
// Len returns the amount of element in the slice.
func (opts *ListOpts) Len() int {
return len((*opts.values))
}
// Type returns a string name for this Option type
func (opts *ListOpts) Type() string {
return "list"
}
// NamedOption is an interface that list and map options
// with names implement.
type NamedOption interface {
Name() string
}
// NamedListOpts is a ListOpts with a configuration name.
// This struct is useful to keep reference to the assigned
// field name in the internal configuration struct.
type NamedListOpts struct {
name string
ListOpts
}
var _ NamedOption = &NamedListOpts{}
// NewNamedListOptsRef creates a reference to a new NamedListOpts struct.
func NewNamedListOptsRef(name string, values *[]string, validator ValidatorFctType) *NamedListOpts {
return &NamedListOpts{
name: name,
ListOpts: *NewListOptsRef(values, validator),
}
}
// Name returns the name of the NamedListOpts in the configuration.
func (o *NamedListOpts) Name() string {
return o.name
}
//MapOpts holds a map of values and a validation function.
type MapOpts struct {
values map[string]string
validator ValidatorFctType
}
// Set validates if needed the input value and add it to the
// internal map, by splitting on '='.
func (opts *MapOpts) Set(value string) error {
if opts.validator != nil {
v, err := opts.validator(value)
if err != nil {
return err
}
value = v
}
vals := strings.SplitN(value, "=", 2)
if len(vals) == 1 {
(opts.values)[vals[0]] = ""
} else {
(opts.values)[vals[0]] = vals[1]
}
return nil
}
// GetAll returns the values of MapOpts as a map.
func (opts *MapOpts) GetAll() map[string]string {
return opts.values
}
func (opts *MapOpts) String() string {
return fmt.Sprintf("%v", map[string]string((opts.values)))
}
// Type returns a string name for this Option type
func (opts *MapOpts) Type() string {
return "map"
}
// NewMapOpts creates a new MapOpts with the specified map of values and a validator.
func NewMapOpts(values map[string]string, validator ValidatorFctType) *MapOpts {
if values == nil {
values = make(map[string]string)
}
return &MapOpts{
values: values,
validator: validator,
}
}
// NamedMapOpts is a MapOpts struct with a configuration name.
// This struct is useful to keep reference to the assigned
// field name in the internal configuration struct.
type NamedMapOpts struct {
name string
MapOpts
}
var _ NamedOption = &NamedMapOpts{}
// NewNamedMapOpts creates a reference to a new NamedMapOpts struct.
func NewNamedMapOpts(name string, values map[string]string, validator ValidatorFctType) *NamedMapOpts {
return &NamedMapOpts{
name: name,
MapOpts: *NewMapOpts(values, validator),
}
}
// Name returns the name of the NamedMapOpts in the configuration.
func (o *NamedMapOpts) Name() string {
return o.name
}
// ValidatorFctType defines a validator function that returns a validated string and/or an error.
type ValidatorFctType func(val string) (string, error)
// ValidatorFctListType defines a validator function that returns a validated list of string and/or an error
type ValidatorFctListType func(val string) ([]string, error)
// ValidateIPAddress validates an Ip address.
func ValidateIPAddress(val string) (string, error) {
var ip = net.ParseIP(strings.TrimSpace(val))
if ip != nil {
return ip.String(), nil
}
return "", fmt.Errorf("%s is not an ip address", val)
}
// ValidateDNSSearch validates domain for resolvconf search configuration.
// A zero length domain is represented by a dot (.).
func ValidateDNSSearch(val string) (string, error) {
if val = strings.Trim(val, " "); val == "." {
return val, nil
}
return validateDomain(val)
}
func validateDomain(val string) (string, error) {
if alphaRegexp.FindString(val) == "" {
return "", fmt.Errorf("%s is not a valid domain", val)
}
ns := domainRegexp.FindSubmatch([]byte(val))
if len(ns) > 0 && len(ns[1]) < 255 {
return string(ns[1]), nil
}
return "", fmt.Errorf("%s is not a valid domain", val)
}
// ValidateLabel validates that the specified string is a valid label, and returns it.
// Labels are in the form on key=value.
func ValidateLabel(val string) (string, error) {
if strings.Count(val, "=") < 1 {
return "", fmt.Errorf("bad attribute format: %s", val)
}
return val, nil
}
// ValidateSysctl validates a sysctl and returns it.
func ValidateSysctl(val string) (string, error) {
validSysctlMap := map[string]bool{
"kernel.msgmax": true,
"kernel.msgmnb": true,
"kernel.msgmni": true,
"kernel.sem": true,
"kernel.shmall": true,
"kernel.shmmax": true,
"kernel.shmmni": true,
"kernel.shm_rmid_forced": true,
}
validSysctlPrefixes := []string{
"net.",
"fs.mqueue.",
}
arr := strings.Split(val, "=")
if len(arr) < 2 {
return "", fmt.Errorf("sysctl '%s' is not whitelisted", val)
}
if validSysctlMap[arr[0]] {
return val, nil
}
for _, vp := range validSysctlPrefixes {
if strings.HasPrefix(arr[0], vp) {
return val, nil
}
}
return "", fmt.Errorf("sysctl '%s' is not whitelisted", val)
}
// FilterOpt is a flag type for validating filters
type FilterOpt struct {
filter filters.Args
}
// NewFilterOpt returns a new FilterOpt
func NewFilterOpt() FilterOpt {
return FilterOpt{filter: filters.NewArgs()}
}
func (o *FilterOpt) String() string {
repr, err := filters.ToParam(o.filter)
if err != nil {
return "invalid filters"
}
return repr
}
// Set sets the value of the opt by parsing the command line value
func (o *FilterOpt) Set(value string) error {
var err error
o.filter, err = filters.ParseFlag(value, o.filter)
return err
}
// Type returns the option type
func (o *FilterOpt) Type() string {
return "filter"
}
// Value returns the value of this option
func (o *FilterOpt) Value() filters.Args {
return o.filter
}

View file

@ -0,0 +1,6 @@
// +build !windows
package opts
// DefaultHTTPHost Default HTTP Host used if only port is provided to -H flag e.g. docker daemon -H tcp://:8080
const DefaultHTTPHost = "localhost"

View file

@ -0,0 +1,56 @@
package opts
// TODO Windows. Identify bug in GOLang 1.5.1+ and/or Windows Server 2016 TP5.
// @jhowardmsft, @swernli.
//
// On Windows, this mitigates a problem with the default options of running
// a docker client against a local docker daemon on TP5.
//
// What was found that if the default host is "localhost", even if the client
// (and daemon as this is local) is not physically on a network, and the DNS
// cache is flushed (ipconfig /flushdns), then the client will pause for
// exactly one second when connecting to the daemon for calls. For example
// using docker run windowsservercore cmd, the CLI will send a create followed
// by an attach. You see the delay between the attach finishing and the attach
// being seen by the daemon.
//
// Here's some daemon debug logs with additional debug spew put in. The
// AfterWriteJSON log is the very last thing the daemon does as part of the
// create call. The POST /attach is the second CLI call. Notice the second
// time gap.
//
// time="2015-11-06T13:38:37.259627400-08:00" level=debug msg="After createRootfs"
// time="2015-11-06T13:38:37.263626300-08:00" level=debug msg="After setHostConfig"
// time="2015-11-06T13:38:37.267631200-08:00" level=debug msg="before createContainerPl...."
// time="2015-11-06T13:38:37.271629500-08:00" level=debug msg=ToDiskLocking....
// time="2015-11-06T13:38:37.275643200-08:00" level=debug msg="loggin event...."
// time="2015-11-06T13:38:37.277627600-08:00" level=debug msg="logged event...."
// time="2015-11-06T13:38:37.279631800-08:00" level=debug msg="In defer func"
// time="2015-11-06T13:38:37.282628100-08:00" level=debug msg="After daemon.create"
// time="2015-11-06T13:38:37.286651700-08:00" level=debug msg="return 2"
// time="2015-11-06T13:38:37.289629500-08:00" level=debug msg="Returned from daemon.ContainerCreate"
// time="2015-11-06T13:38:37.311629100-08:00" level=debug msg="After WriteJSON"
// ... 1 second gap here....
// time="2015-11-06T13:38:38.317866200-08:00" level=debug msg="Calling POST /v1.22/containers/984758282b842f779e805664b2c95d563adc9a979c8a3973e68c807843ee4757/attach"
// time="2015-11-06T13:38:38.326882500-08:00" level=info msg="POST /v1.22/containers/984758282b842f779e805664b2c95d563adc9a979c8a3973e68c807843ee4757/attach?stderr=1&stdin=1&stdout=1&stream=1"
//
// We suspect this is either a bug introduced in GOLang 1.5.1, or that a change
// in GOLang 1.5.1 (from 1.4.3) is exposing a bug in Windows. In theory,
// the Windows networking stack is supposed to resolve "localhost" internally,
// without hitting DNS, or even reading the hosts file (which is why localhost
// is commented out in the hosts file on Windows).
//
// We have validated that working around this using the actual IPv4 localhost
// address does not cause the delay.
//
// This does not occur with the docker client built with 1.4.3 on the same
// Windows build, regardless of whether the daemon is built using 1.5.1
// or 1.4.3. It does not occur on Linux. We also verified we see the same thing
// on a cross-compiled Windows binary (from Linux).
//
// Final note: This is a mitigation, not a 'real' fix. It is still susceptible
// to the delay if a user were to do 'docker run -H=tcp://localhost:2375...'
// explicitly.
// DefaultHTTPHost Default HTTP Host used if only port is provided to -H flag e.g. docker daemon -H tcp://:8080
const DefaultHTTPHost = "127.0.0.1"

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,91 @@
package archive
import (
"archive/tar"
"os"
"path/filepath"
"strings"
"syscall"
"github.com/docker/docker/pkg/system"
)
func getWhiteoutConverter(format WhiteoutFormat) tarWhiteoutConverter {
if format == OverlayWhiteoutFormat {
return overlayWhiteoutConverter{}
}
return nil
}
type overlayWhiteoutConverter struct{}
func (overlayWhiteoutConverter) ConvertWrite(hdr *tar.Header, path string, fi os.FileInfo) error {
// convert whiteouts to AUFS format
if fi.Mode()&os.ModeCharDevice != 0 && hdr.Devmajor == 0 && hdr.Devminor == 0 {
// we just rename the file and make it normal
dir, filename := filepath.Split(hdr.Name)
hdr.Name = filepath.Join(dir, WhiteoutPrefix+filename)
hdr.Mode = 0600
hdr.Typeflag = tar.TypeReg
hdr.Size = 0
}
if fi.Mode()&os.ModeDir != 0 {
// convert opaque dirs to AUFS format by writing an empty file with the prefix
opaque, err := system.Lgetxattr(path, "trusted.overlay.opaque")
if err != nil {
return err
}
if opaque != nil && len(opaque) == 1 && opaque[0] == 'y' {
// create a header for the whiteout file
// it should inherit some properties from the parent, but be a regular file
*hdr = tar.Header{
Typeflag: tar.TypeReg,
Mode: hdr.Mode & int64(os.ModePerm),
Name: filepath.Join(hdr.Name, WhiteoutOpaqueDir),
Size: 0,
Uid: hdr.Uid,
Uname: hdr.Uname,
Gid: hdr.Gid,
Gname: hdr.Gname,
AccessTime: hdr.AccessTime,
ChangeTime: hdr.ChangeTime,
}
}
}
return nil
}
func (overlayWhiteoutConverter) ConvertRead(hdr *tar.Header, path string) (bool, error) {
base := filepath.Base(path)
dir := filepath.Dir(path)
// if a directory is marked as opaque by the AUFS special file, we need to translate that to overlay
if base == WhiteoutOpaqueDir {
if err := syscall.Setxattr(dir, "trusted.overlay.opaque", []byte{'y'}, 0); err != nil {
return false, err
}
// don't write the file itself
return false, nil
}
// if a file was deleted and we are using overlay, we need to create a character device
if strings.HasPrefix(base, WhiteoutPrefix) {
originalBase := base[len(WhiteoutPrefix):]
originalPath := filepath.Join(dir, originalBase)
if err := syscall.Mknod(originalPath, syscall.S_IFCHR, 0); err != nil {
return false, err
}
if err := os.Chown(originalPath, hdr.Uid, hdr.Gid); err != nil {
return false, err
}
// don't write the file itself
return false, nil
}
return true, nil
}

View file

@ -0,0 +1,7 @@
// +build !linux
package archive
func getWhiteoutConverter(format WhiteoutFormat) tarWhiteoutConverter {
return nil
}

View file

@ -0,0 +1,112 @@
// +build !windows
package archive
import (
"archive/tar"
"errors"
"os"
"path/filepath"
"syscall"
"github.com/docker/docker/pkg/system"
)
// fixVolumePathPrefix does platform specific processing to ensure that if
// the path being passed in is not in a volume path format, convert it to one.
func fixVolumePathPrefix(srcPath string) string {
return srcPath
}
// getWalkRoot calculates the root path when performing a TarWithOptions.
// We use a separate function as this is platform specific. On Linux, we
// can't use filepath.Join(srcPath,include) because this will clean away
// a trailing "." or "/" which may be important.
func getWalkRoot(srcPath string, include string) string {
return srcPath + string(filepath.Separator) + include
}
// CanonicalTarNameForPath returns platform-specific filepath
// to canonical posix-style path for tar archival. p is relative
// path.
func CanonicalTarNameForPath(p string) (string, error) {
return p, nil // already unix-style
}
// chmodTarEntry is used to adjust the file permissions used in tar header based
// on the platform the archival is done.
func chmodTarEntry(perm os.FileMode) os.FileMode {
return perm // noop for unix as golang APIs provide perm bits correctly
}
func setHeaderForSpecialDevice(hdr *tar.Header, ta *tarAppender, name string, stat interface{}) (inode uint64, err error) {
s, ok := stat.(*syscall.Stat_t)
if !ok {
err = errors.New("cannot convert stat value to syscall.Stat_t")
return
}
inode = uint64(s.Ino)
// Currently go does not fill in the major/minors
if s.Mode&syscall.S_IFBLK != 0 ||
s.Mode&syscall.S_IFCHR != 0 {
hdr.Devmajor = int64(major(uint64(s.Rdev)))
hdr.Devminor = int64(minor(uint64(s.Rdev)))
}
return
}
func getFileUIDGID(stat interface{}) (int, int, error) {
s, ok := stat.(*syscall.Stat_t)
if !ok {
return -1, -1, errors.New("cannot convert stat value to syscall.Stat_t")
}
return int(s.Uid), int(s.Gid), nil
}
func major(device uint64) uint64 {
return (device >> 8) & 0xfff
}
func minor(device uint64) uint64 {
return (device & 0xff) | ((device >> 12) & 0xfff00)
}
// handleTarTypeBlockCharFifo is an OS-specific helper function used by
// createTarFile to handle the following types of header: Block; Char; Fifo
func handleTarTypeBlockCharFifo(hdr *tar.Header, path string) error {
mode := uint32(hdr.Mode & 07777)
switch hdr.Typeflag {
case tar.TypeBlock:
mode |= syscall.S_IFBLK
case tar.TypeChar:
mode |= syscall.S_IFCHR
case tar.TypeFifo:
mode |= syscall.S_IFIFO
}
if err := system.Mknod(path, mode, int(system.Mkdev(hdr.Devmajor, hdr.Devminor))); err != nil {
return err
}
return nil
}
func handleLChmod(hdr *tar.Header, path string, hdrInfo os.FileInfo) error {
if hdr.Typeflag == tar.TypeLink {
if fi, err := os.Lstat(hdr.Linkname); err == nil && (fi.Mode()&os.ModeSymlink == 0) {
if err := os.Chmod(path, hdrInfo.Mode()); err != nil {
return err
}
}
} else if hdr.Typeflag != tar.TypeSymlink {
if err := os.Chmod(path, hdrInfo.Mode()); err != nil {
return err
}
}
return nil
}

View file

@ -0,0 +1,70 @@
// +build windows
package archive
import (
"archive/tar"
"fmt"
"os"
"path/filepath"
"strings"
"github.com/docker/docker/pkg/longpath"
)
// fixVolumePathPrefix does platform specific processing to ensure that if
// the path being passed in is not in a volume path format, convert it to one.
func fixVolumePathPrefix(srcPath string) string {
return longpath.AddPrefix(srcPath)
}
// getWalkRoot calculates the root path when performing a TarWithOptions.
// We use a separate function as this is platform specific.
func getWalkRoot(srcPath string, include string) string {
return filepath.Join(srcPath, include)
}
// CanonicalTarNameForPath returns platform-specific filepath
// to canonical posix-style path for tar archival. p is relative
// path.
func CanonicalTarNameForPath(p string) (string, error) {
// windows: convert windows style relative path with backslashes
// into forward slashes. Since windows does not allow '/' or '\'
// in file names, it is mostly safe to replace however we must
// check just in case
if strings.Contains(p, "/") {
return "", fmt.Errorf("Windows path contains forward slash: %s", p)
}
return strings.Replace(p, string(os.PathSeparator), "/", -1), nil
}
// chmodTarEntry is used to adjust the file permissions used in tar header based
// on the platform the archival is done.
func chmodTarEntry(perm os.FileMode) os.FileMode {
perm &= 0755
// Add the x bit: make everything +x from windows
perm |= 0111
return perm
}
func setHeaderForSpecialDevice(hdr *tar.Header, ta *tarAppender, name string, stat interface{}) (inode uint64, err error) {
// do nothing. no notion of Rdev, Inode, Nlink in stat on Windows
return
}
// handleTarTypeBlockCharFifo is an OS-specific helper function used by
// createTarFile to handle the following types of header: Block; Char; Fifo
func handleTarTypeBlockCharFifo(hdr *tar.Header, path string) error {
return nil
}
func handleLChmod(hdr *tar.Header, path string, hdrInfo os.FileInfo) error {
return nil
}
func getFileUIDGID(stat interface{}) (int, int, error) {
// no notion of file ownership mapping yet on Windows
return 0, 0, nil
}

View file

@ -0,0 +1,446 @@
package archive
import (
"archive/tar"
"bytes"
"fmt"
"io"
"io/ioutil"
"os"
"path/filepath"
"sort"
"strings"
"syscall"
"time"
"github.com/Sirupsen/logrus"
"github.com/docker/docker/pkg/idtools"
"github.com/docker/docker/pkg/pools"
"github.com/docker/docker/pkg/system"
)
// ChangeType represents the change type.
type ChangeType int
const (
// ChangeModify represents the modify operation.
ChangeModify = iota
// ChangeAdd represents the add operation.
ChangeAdd
// ChangeDelete represents the delete operation.
ChangeDelete
)
func (c ChangeType) String() string {
switch c {
case ChangeModify:
return "C"
case ChangeAdd:
return "A"
case ChangeDelete:
return "D"
}
return ""
}
// Change represents a change, it wraps the change type and path.
// It describes changes of the files in the path respect to the
// parent layers. The change could be modify, add, delete.
// This is used for layer diff.
type Change struct {
Path string
Kind ChangeType
}
func (change *Change) String() string {
return fmt.Sprintf("%s %s", change.Kind, change.Path)
}
// for sort.Sort
type changesByPath []Change
func (c changesByPath) Less(i, j int) bool { return c[i].Path < c[j].Path }
func (c changesByPath) Len() int { return len(c) }
func (c changesByPath) Swap(i, j int) { c[j], c[i] = c[i], c[j] }
// Gnu tar and the go tar writer don't have sub-second mtime
// precision, which is problematic when we apply changes via tar
// files, we handle this by comparing for exact times, *or* same
// second count and either a or b having exactly 0 nanoseconds
func sameFsTime(a, b time.Time) bool {
return a == b ||
(a.Unix() == b.Unix() &&
(a.Nanosecond() == 0 || b.Nanosecond() == 0))
}
func sameFsTimeSpec(a, b syscall.Timespec) bool {
return a.Sec == b.Sec &&
(a.Nsec == b.Nsec || a.Nsec == 0 || b.Nsec == 0)
}
// Changes walks the path rw and determines changes for the files in the path,
// with respect to the parent layers
func Changes(layers []string, rw string) ([]Change, error) {
return changes(layers, rw, aufsDeletedFile, aufsMetadataSkip)
}
func aufsMetadataSkip(path string) (skip bool, err error) {
skip, err = filepath.Match(string(os.PathSeparator)+WhiteoutMetaPrefix+"*", path)
if err != nil {
skip = true
}
return
}
func aufsDeletedFile(root, path string, fi os.FileInfo) (string, error) {
f := filepath.Base(path)
// If there is a whiteout, then the file was removed
if strings.HasPrefix(f, WhiteoutPrefix) {
originalFile := f[len(WhiteoutPrefix):]
return filepath.Join(filepath.Dir(path), originalFile), nil
}
return "", nil
}
type skipChange func(string) (bool, error)
type deleteChange func(string, string, os.FileInfo) (string, error)
func changes(layers []string, rw string, dc deleteChange, sc skipChange) ([]Change, error) {
var (
changes []Change
changedDirs = make(map[string]struct{})
)
err := filepath.Walk(rw, func(path string, f os.FileInfo, err error) error {
if err != nil {
return err
}
// Rebase path
path, err = filepath.Rel(rw, path)
if err != nil {
return err
}
// As this runs on the daemon side, file paths are OS specific.
path = filepath.Join(string(os.PathSeparator), path)
// Skip root
if path == string(os.PathSeparator) {
return nil
}
if sc != nil {
if skip, err := sc(path); skip {
return err
}
}
change := Change{
Path: path,
}
deletedFile, err := dc(rw, path, f)
if err != nil {
return err
}
// Find out what kind of modification happened
if deletedFile != "" {
change.Path = deletedFile
change.Kind = ChangeDelete
} else {
// Otherwise, the file was added
change.Kind = ChangeAdd
// ...Unless it already existed in a top layer, in which case, it's a modification
for _, layer := range layers {
stat, err := os.Stat(filepath.Join(layer, path))
if err != nil && !os.IsNotExist(err) {
return err
}
if err == nil {
// The file existed in the top layer, so that's a modification
// However, if it's a directory, maybe it wasn't actually modified.
// If you modify /foo/bar/baz, then /foo will be part of the changed files only because it's the parent of bar
if stat.IsDir() && f.IsDir() {
if f.Size() == stat.Size() && f.Mode() == stat.Mode() && sameFsTime(f.ModTime(), stat.ModTime()) {
// Both directories are the same, don't record the change
return nil
}
}
change.Kind = ChangeModify
break
}
}
}
// If /foo/bar/file.txt is modified, then /foo/bar must be part of the changed files.
// This block is here to ensure the change is recorded even if the
// modify time, mode and size of the parent directory in the rw and ro layers are all equal.
// Check https://github.com/docker/docker/pull/13590 for details.
if f.IsDir() {
changedDirs[path] = struct{}{}
}
if change.Kind == ChangeAdd || change.Kind == ChangeDelete {
parent := filepath.Dir(path)
if _, ok := changedDirs[parent]; !ok && parent != "/" {
changes = append(changes, Change{Path: parent, Kind: ChangeModify})
changedDirs[parent] = struct{}{}
}
}
// Record change
changes = append(changes, change)
return nil
})
if err != nil && !os.IsNotExist(err) {
return nil, err
}
return changes, nil
}
// FileInfo describes the information of a file.
type FileInfo struct {
parent *FileInfo
name string
stat *system.StatT
children map[string]*FileInfo
capability []byte
added bool
}
// LookUp looks up the file information of a file.
func (info *FileInfo) LookUp(path string) *FileInfo {
// As this runs on the daemon side, file paths are OS specific.
parent := info
if path == string(os.PathSeparator) {
return info
}
pathElements := strings.Split(path, string(os.PathSeparator))
for _, elem := range pathElements {
if elem != "" {
child := parent.children[elem]
if child == nil {
return nil
}
parent = child
}
}
return parent
}
func (info *FileInfo) path() string {
if info.parent == nil {
// As this runs on the daemon side, file paths are OS specific.
return string(os.PathSeparator)
}
return filepath.Join(info.parent.path(), info.name)
}
func (info *FileInfo) addChanges(oldInfo *FileInfo, changes *[]Change) {
sizeAtEntry := len(*changes)
if oldInfo == nil {
// add
change := Change{
Path: info.path(),
Kind: ChangeAdd,
}
*changes = append(*changes, change)
info.added = true
}
// We make a copy so we can modify it to detect additions
// also, we only recurse on the old dir if the new info is a directory
// otherwise any previous delete/change is considered recursive
oldChildren := make(map[string]*FileInfo)
if oldInfo != nil && info.isDir() {
for k, v := range oldInfo.children {
oldChildren[k] = v
}
}
for name, newChild := range info.children {
oldChild, _ := oldChildren[name]
if oldChild != nil {
// change?
oldStat := oldChild.stat
newStat := newChild.stat
// Note: We can't compare inode or ctime or blocksize here, because these change
// when copying a file into a container. However, that is not generally a problem
// because any content change will change mtime, and any status change should
// be visible when actually comparing the stat fields. The only time this
// breaks down is if some code intentionally hides a change by setting
// back mtime
if statDifferent(oldStat, newStat) ||
bytes.Compare(oldChild.capability, newChild.capability) != 0 {
change := Change{
Path: newChild.path(),
Kind: ChangeModify,
}
*changes = append(*changes, change)
newChild.added = true
}
// Remove from copy so we can detect deletions
delete(oldChildren, name)
}
newChild.addChanges(oldChild, changes)
}
for _, oldChild := range oldChildren {
// delete
change := Change{
Path: oldChild.path(),
Kind: ChangeDelete,
}
*changes = append(*changes, change)
}
// If there were changes inside this directory, we need to add it, even if the directory
// itself wasn't changed. This is needed to properly save and restore filesystem permissions.
// As this runs on the daemon side, file paths are OS specific.
if len(*changes) > sizeAtEntry && info.isDir() && !info.added && info.path() != string(os.PathSeparator) {
change := Change{
Path: info.path(),
Kind: ChangeModify,
}
// Let's insert the directory entry before the recently added entries located inside this dir
*changes = append(*changes, change) // just to resize the slice, will be overwritten
copy((*changes)[sizeAtEntry+1:], (*changes)[sizeAtEntry:])
(*changes)[sizeAtEntry] = change
}
}
// Changes add changes to file information.
func (info *FileInfo) Changes(oldInfo *FileInfo) []Change {
var changes []Change
info.addChanges(oldInfo, &changes)
return changes
}
func newRootFileInfo() *FileInfo {
// As this runs on the daemon side, file paths are OS specific.
root := &FileInfo{
name: string(os.PathSeparator),
children: make(map[string]*FileInfo),
}
return root
}
// ChangesDirs compares two directories and generates an array of Change objects describing the changes.
// If oldDir is "", then all files in newDir will be Add-Changes.
func ChangesDirs(newDir, oldDir string) ([]Change, error) {
var (
oldRoot, newRoot *FileInfo
)
if oldDir == "" {
emptyDir, err := ioutil.TempDir("", "empty")
if err != nil {
return nil, err
}
defer os.Remove(emptyDir)
oldDir = emptyDir
}
oldRoot, newRoot, err := collectFileInfoForChanges(oldDir, newDir)
if err != nil {
return nil, err
}
return newRoot.Changes(oldRoot), nil
}
// ChangesSize calculates the size in bytes of the provided changes, based on newDir.
func ChangesSize(newDir string, changes []Change) int64 {
var (
size int64
sf = make(map[uint64]struct{})
)
for _, change := range changes {
if change.Kind == ChangeModify || change.Kind == ChangeAdd {
file := filepath.Join(newDir, change.Path)
fileInfo, err := os.Lstat(file)
if err != nil {
logrus.Errorf("Can not stat %q: %s", file, err)
continue
}
if fileInfo != nil && !fileInfo.IsDir() {
if hasHardlinks(fileInfo) {
inode := getIno(fileInfo)
if _, ok := sf[inode]; !ok {
size += fileInfo.Size()
sf[inode] = struct{}{}
}
} else {
size += fileInfo.Size()
}
}
}
}
return size
}
// ExportChanges produces an Archive from the provided changes, relative to dir.
func ExportChanges(dir string, changes []Change, uidMaps, gidMaps []idtools.IDMap) (Archive, error) {
reader, writer := io.Pipe()
go func() {
ta := &tarAppender{
TarWriter: tar.NewWriter(writer),
Buffer: pools.BufioWriter32KPool.Get(nil),
SeenFiles: make(map[uint64]string),
UIDMaps: uidMaps,
GIDMaps: gidMaps,
}
// this buffer is needed for the duration of this piped stream
defer pools.BufioWriter32KPool.Put(ta.Buffer)
sort.Sort(changesByPath(changes))
// In general we log errors here but ignore them because
// during e.g. a diff operation the container can continue
// mutating the filesystem and we can see transient errors
// from this
for _, change := range changes {
if change.Kind == ChangeDelete {
whiteOutDir := filepath.Dir(change.Path)
whiteOutBase := filepath.Base(change.Path)
whiteOut := filepath.Join(whiteOutDir, WhiteoutPrefix+whiteOutBase)
timestamp := time.Now()
hdr := &tar.Header{
Name: whiteOut[1:],
Size: 0,
ModTime: timestamp,
AccessTime: timestamp,
ChangeTime: timestamp,
}
if err := ta.TarWriter.WriteHeader(hdr); err != nil {
logrus.Debugf("Can't write whiteout header: %s", err)
}
} else {
path := filepath.Join(dir, change.Path)
if err := ta.addTarFile(path, change.Path[1:]); err != nil {
logrus.Debugf("Can't add file %s to tar: %s", path, err)
}
}
}
// Make sure to check the error on Close.
if err := ta.TarWriter.Close(); err != nil {
logrus.Debugf("Can't close layer: %s", err)
}
if err := writer.Close(); err != nil {
logrus.Debugf("failed close Changes writer: %s", err)
}
}()
return reader, nil
}

View file

@ -0,0 +1,312 @@
package archive
import (
"bytes"
"fmt"
"os"
"path/filepath"
"sort"
"syscall"
"unsafe"
"github.com/docker/docker/pkg/system"
)
// walker is used to implement collectFileInfoForChanges on linux. Where this
// method in general returns the entire contents of two directory trees, we
// optimize some FS calls out on linux. In particular, we take advantage of the
// fact that getdents(2) returns the inode of each file in the directory being
// walked, which, when walking two trees in parallel to generate a list of
// changes, can be used to prune subtrees without ever having to lstat(2) them
// directly. Eliminating stat calls in this way can save up to seconds on large
// images.
type walker struct {
dir1 string
dir2 string
root1 *FileInfo
root2 *FileInfo
}
// collectFileInfoForChanges returns a complete representation of the trees
// rooted at dir1 and dir2, with one important exception: any subtree or
// leaf where the inode and device numbers are an exact match between dir1
// and dir2 will be pruned from the results. This method is *only* to be used
// to generating a list of changes between the two directories, as it does not
// reflect the full contents.
func collectFileInfoForChanges(dir1, dir2 string) (*FileInfo, *FileInfo, error) {
w := &walker{
dir1: dir1,
dir2: dir2,
root1: newRootFileInfo(),
root2: newRootFileInfo(),
}
i1, err := os.Lstat(w.dir1)
if err != nil {
return nil, nil, err
}
i2, err := os.Lstat(w.dir2)
if err != nil {
return nil, nil, err
}
if err := w.walk("/", i1, i2); err != nil {
return nil, nil, err
}
return w.root1, w.root2, nil
}
// Given a FileInfo, its path info, and a reference to the root of the tree
// being constructed, register this file with the tree.
func walkchunk(path string, fi os.FileInfo, dir string, root *FileInfo) error {
if fi == nil {
return nil
}
parent := root.LookUp(filepath.Dir(path))
if parent == nil {
return fmt.Errorf("collectFileInfoForChanges: Unexpectedly no parent for %s", path)
}
info := &FileInfo{
name: filepath.Base(path),
children: make(map[string]*FileInfo),
parent: parent,
}
cpath := filepath.Join(dir, path)
stat, err := system.FromStatT(fi.Sys().(*syscall.Stat_t))
if err != nil {
return err
}
info.stat = stat
info.capability, _ = system.Lgetxattr(cpath, "security.capability") // lgetxattr(2): fs access
parent.children[info.name] = info
return nil
}
// Walk a subtree rooted at the same path in both trees being iterated. For
// example, /docker/overlay/1234/a/b/c/d and /docker/overlay/8888/a/b/c/d
func (w *walker) walk(path string, i1, i2 os.FileInfo) (err error) {
// Register these nodes with the return trees, unless we're still at the
// (already-created) roots:
if path != "/" {
if err := walkchunk(path, i1, w.dir1, w.root1); err != nil {
return err
}
if err := walkchunk(path, i2, w.dir2, w.root2); err != nil {
return err
}
}
is1Dir := i1 != nil && i1.IsDir()
is2Dir := i2 != nil && i2.IsDir()
sameDevice := false
if i1 != nil && i2 != nil {
si1 := i1.Sys().(*syscall.Stat_t)
si2 := i2.Sys().(*syscall.Stat_t)
if si1.Dev == si2.Dev {
sameDevice = true
}
}
// If these files are both non-existent, or leaves (non-dirs), we are done.
if !is1Dir && !is2Dir {
return nil
}
// Fetch the names of all the files contained in both directories being walked:
var names1, names2 []nameIno
if is1Dir {
names1, err = readdirnames(filepath.Join(w.dir1, path)) // getdents(2): fs access
if err != nil {
return err
}
}
if is2Dir {
names2, err = readdirnames(filepath.Join(w.dir2, path)) // getdents(2): fs access
if err != nil {
return err
}
}
// We have lists of the files contained in both parallel directories, sorted
// in the same order. Walk them in parallel, generating a unique merged list
// of all items present in either or both directories.
var names []string
ix1 := 0
ix2 := 0
for {
if ix1 >= len(names1) {
break
}
if ix2 >= len(names2) {
break
}
ni1 := names1[ix1]
ni2 := names2[ix2]
switch bytes.Compare([]byte(ni1.name), []byte(ni2.name)) {
case -1: // ni1 < ni2 -- advance ni1
// we will not encounter ni1 in names2
names = append(names, ni1.name)
ix1++
case 0: // ni1 == ni2
if ni1.ino != ni2.ino || !sameDevice {
names = append(names, ni1.name)
}
ix1++
ix2++
case 1: // ni1 > ni2 -- advance ni2
// we will not encounter ni2 in names1
names = append(names, ni2.name)
ix2++
}
}
for ix1 < len(names1) {
names = append(names, names1[ix1].name)
ix1++
}
for ix2 < len(names2) {
names = append(names, names2[ix2].name)
ix2++
}
// For each of the names present in either or both of the directories being
// iterated, stat the name under each root, and recurse the pair of them:
for _, name := range names {
fname := filepath.Join(path, name)
var cInfo1, cInfo2 os.FileInfo
if is1Dir {
cInfo1, err = os.Lstat(filepath.Join(w.dir1, fname)) // lstat(2): fs access
if err != nil && !os.IsNotExist(err) {
return err
}
}
if is2Dir {
cInfo2, err = os.Lstat(filepath.Join(w.dir2, fname)) // lstat(2): fs access
if err != nil && !os.IsNotExist(err) {
return err
}
}
if err = w.walk(fname, cInfo1, cInfo2); err != nil {
return err
}
}
return nil
}
// {name,inode} pairs used to support the early-pruning logic of the walker type
type nameIno struct {
name string
ino uint64
}
type nameInoSlice []nameIno
func (s nameInoSlice) Len() int { return len(s) }
func (s nameInoSlice) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func (s nameInoSlice) Less(i, j int) bool { return s[i].name < s[j].name }
// readdirnames is a hacked-apart version of the Go stdlib code, exposing inode
// numbers further up the stack when reading directory contents. Unlike
// os.Readdirnames, which returns a list of filenames, this function returns a
// list of {filename,inode} pairs.
func readdirnames(dirname string) (names []nameIno, err error) {
var (
size = 100
buf = make([]byte, 4096)
nbuf int
bufp int
nb int
)
f, err := os.Open(dirname)
if err != nil {
return nil, err
}
defer f.Close()
names = make([]nameIno, 0, size) // Empty with room to grow.
for {
// Refill the buffer if necessary
if bufp >= nbuf {
bufp = 0
nbuf, err = syscall.ReadDirent(int(f.Fd()), buf) // getdents on linux
if nbuf < 0 {
nbuf = 0
}
if err != nil {
return nil, os.NewSyscallError("readdirent", err)
}
if nbuf <= 0 {
break // EOF
}
}
// Drain the buffer
nb, names = parseDirent(buf[bufp:nbuf], names)
bufp += nb
}
sl := nameInoSlice(names)
sort.Sort(sl)
return sl, nil
}
// parseDirent is a minor modification of syscall.ParseDirent (linux version)
// which returns {name,inode} pairs instead of just names.
func parseDirent(buf []byte, names []nameIno) (consumed int, newnames []nameIno) {
origlen := len(buf)
for len(buf) > 0 {
dirent := (*syscall.Dirent)(unsafe.Pointer(&buf[0]))
buf = buf[dirent.Reclen:]
if dirent.Ino == 0 { // File absent in directory.
continue
}
bytes := (*[10000]byte)(unsafe.Pointer(&dirent.Name[0]))
var name = string(bytes[0:clen(bytes[:])])
if name == "." || name == ".." { // Useless names
continue
}
names = append(names, nameIno{name, dirent.Ino})
}
return origlen - len(buf), names
}
func clen(n []byte) int {
for i := 0; i < len(n); i++ {
if n[i] == 0 {
return i
}
}
return len(n)
}
// OverlayChanges walks the path rw and determines changes for the files in the path,
// with respect to the parent layers
func OverlayChanges(layers []string, rw string) ([]Change, error) {
return changes(layers, rw, overlayDeletedFile, nil)
}
func overlayDeletedFile(root, path string, fi os.FileInfo) (string, error) {
if fi.Mode()&os.ModeCharDevice != 0 {
s := fi.Sys().(*syscall.Stat_t)
if major(uint64(s.Rdev)) == 0 && minor(uint64(s.Rdev)) == 0 {
return path, nil
}
}
if fi.Mode()&os.ModeDir != 0 {
opaque, err := system.Lgetxattr(filepath.Join(root, path), "trusted.overlay.opaque")
if err != nil {
return "", err
}
if opaque != nil && len(opaque) == 1 && opaque[0] == 'y' {
return path, nil
}
}
return "", nil
}

View file

@ -0,0 +1,97 @@
// +build !linux
package archive
import (
"fmt"
"os"
"path/filepath"
"runtime"
"strings"
"github.com/docker/docker/pkg/system"
)
func collectFileInfoForChanges(oldDir, newDir string) (*FileInfo, *FileInfo, error) {
var (
oldRoot, newRoot *FileInfo
err1, err2 error
errs = make(chan error, 2)
)
go func() {
oldRoot, err1 = collectFileInfo(oldDir)
errs <- err1
}()
go func() {
newRoot, err2 = collectFileInfo(newDir)
errs <- err2
}()
// block until both routines have returned
for i := 0; i < 2; i++ {
if err := <-errs; err != nil {
return nil, nil, err
}
}
return oldRoot, newRoot, nil
}
func collectFileInfo(sourceDir string) (*FileInfo, error) {
root := newRootFileInfo()
err := filepath.Walk(sourceDir, func(path string, f os.FileInfo, err error) error {
if err != nil {
return err
}
// Rebase path
relPath, err := filepath.Rel(sourceDir, path)
if err != nil {
return err
}
// As this runs on the daemon side, file paths are OS specific.
relPath = filepath.Join(string(os.PathSeparator), relPath)
// See https://github.com/golang/go/issues/9168 - bug in filepath.Join.
// Temporary workaround. If the returned path starts with two backslashes,
// trim it down to a single backslash. Only relevant on Windows.
if runtime.GOOS == "windows" {
if strings.HasPrefix(relPath, `\\`) {
relPath = relPath[1:]
}
}
if relPath == string(os.PathSeparator) {
return nil
}
parent := root.LookUp(filepath.Dir(relPath))
if parent == nil {
return fmt.Errorf("collectFileInfo: Unexpectedly no parent for %s", relPath)
}
info := &FileInfo{
name: filepath.Base(relPath),
children: make(map[string]*FileInfo),
parent: parent,
}
s, err := system.Lstat(path)
if err != nil {
return err
}
info.stat = s
info.capability, _ = system.Lgetxattr(path, "security.capability")
parent.children[info.name] = info
return nil
})
if err != nil {
return nil, err
}
return root, nil
}

View file

@ -0,0 +1,36 @@
// +build !windows
package archive
import (
"os"
"syscall"
"github.com/docker/docker/pkg/system"
)
func statDifferent(oldStat *system.StatT, newStat *system.StatT) bool {
// Don't look at size for dirs, its not a good measure of change
if oldStat.Mode() != newStat.Mode() ||
oldStat.UID() != newStat.UID() ||
oldStat.GID() != newStat.GID() ||
oldStat.Rdev() != newStat.Rdev() ||
// Don't look at size for dirs, its not a good measure of change
(oldStat.Mode()&syscall.S_IFDIR != syscall.S_IFDIR &&
(!sameFsTimeSpec(oldStat.Mtim(), newStat.Mtim()) || (oldStat.Size() != newStat.Size()))) {
return true
}
return false
}
func (info *FileInfo) isDir() bool {
return info.parent == nil || info.stat.Mode()&syscall.S_IFDIR != 0
}
func getIno(fi os.FileInfo) uint64 {
return uint64(fi.Sys().(*syscall.Stat_t).Ino)
}
func hasHardlinks(fi os.FileInfo) bool {
return fi.Sys().(*syscall.Stat_t).Nlink > 1
}

View file

@ -0,0 +1,30 @@
package archive
import (
"os"
"github.com/docker/docker/pkg/system"
)
func statDifferent(oldStat *system.StatT, newStat *system.StatT) bool {
// Don't look at size for dirs, its not a good measure of change
if oldStat.ModTime() != newStat.ModTime() ||
oldStat.Mode() != newStat.Mode() ||
oldStat.Size() != newStat.Size() && !oldStat.IsDir() {
return true
}
return false
}
func (info *FileInfo) isDir() bool {
return info.parent == nil || info.stat.IsDir()
}
func getIno(fi os.FileInfo) (inode uint64) {
return
}
func hasHardlinks(fi os.FileInfo) bool {
return false
}

View file

@ -0,0 +1,458 @@
package archive
import (
"archive/tar"
"errors"
"io"
"io/ioutil"
"os"
"path/filepath"
"strings"
"github.com/Sirupsen/logrus"
"github.com/docker/docker/pkg/system"
)
// Errors used or returned by this file.
var (
ErrNotDirectory = errors.New("not a directory")
ErrDirNotExists = errors.New("no such directory")
ErrCannotCopyDir = errors.New("cannot copy directory")
ErrInvalidCopySource = errors.New("invalid copy source content")
)
// PreserveTrailingDotOrSeparator returns the given cleaned path (after
// processing using any utility functions from the path or filepath stdlib
// packages) and appends a trailing `/.` or `/` if its corresponding original
// path (from before being processed by utility functions from the path or
// filepath stdlib packages) ends with a trailing `/.` or `/`. If the cleaned
// path already ends in a `.` path segment, then another is not added. If the
// clean path already ends in a path separator, then another is not added.
func PreserveTrailingDotOrSeparator(cleanedPath, originalPath string) string {
// Ensure paths are in platform semantics
cleanedPath = normalizePath(cleanedPath)
originalPath = normalizePath(originalPath)
if !specifiesCurrentDir(cleanedPath) && specifiesCurrentDir(originalPath) {
if !hasTrailingPathSeparator(cleanedPath) {
// Add a separator if it doesn't already end with one (a cleaned
// path would only end in a separator if it is the root).
cleanedPath += string(filepath.Separator)
}
cleanedPath += "."
}
if !hasTrailingPathSeparator(cleanedPath) && hasTrailingPathSeparator(originalPath) {
cleanedPath += string(filepath.Separator)
}
return cleanedPath
}
// assertsDirectory returns whether the given path is
// asserted to be a directory, i.e., the path ends with
// a trailing '/' or `/.`, assuming a path separator of `/`.
func assertsDirectory(path string) bool {
return hasTrailingPathSeparator(path) || specifiesCurrentDir(path)
}
// hasTrailingPathSeparator returns whether the given
// path ends with the system's path separator character.
func hasTrailingPathSeparator(path string) bool {
return len(path) > 0 && os.IsPathSeparator(path[len(path)-1])
}
// specifiesCurrentDir returns whether the given path specifies
// a "current directory", i.e., the last path segment is `.`.
func specifiesCurrentDir(path string) bool {
return filepath.Base(path) == "."
}
// SplitPathDirEntry splits the given path between its directory name and its
// basename by first cleaning the path but preserves a trailing "." if the
// original path specified the current directory.
func SplitPathDirEntry(path string) (dir, base string) {
cleanedPath := filepath.Clean(normalizePath(path))
if specifiesCurrentDir(path) {
cleanedPath += string(filepath.Separator) + "."
}
return filepath.Dir(cleanedPath), filepath.Base(cleanedPath)
}
// TarResource archives the resource described by the given CopyInfo to a Tar
// archive. A non-nil error is returned if sourcePath does not exist or is
// asserted to be a directory but exists as another type of file.
//
// This function acts as a convenient wrapper around TarWithOptions, which
// requires a directory as the source path. TarResource accepts either a
// directory or a file path and correctly sets the Tar options.
func TarResource(sourceInfo CopyInfo) (content Archive, err error) {
return TarResourceRebase(sourceInfo.Path, sourceInfo.RebaseName)
}
// TarResourceRebase is like TarResource but renames the first path element of
// items in the resulting tar archive to match the given rebaseName if not "".
func TarResourceRebase(sourcePath, rebaseName string) (content Archive, err error) {
sourcePath = normalizePath(sourcePath)
if _, err = os.Lstat(sourcePath); err != nil {
// Catches the case where the source does not exist or is not a
// directory if asserted to be a directory, as this also causes an
// error.
return
}
// Separate the source path between its directory and
// the entry in that directory which we are archiving.
sourceDir, sourceBase := SplitPathDirEntry(sourcePath)
filter := []string{sourceBase}
logrus.Debugf("copying %q from %q", sourceBase, sourceDir)
return TarWithOptions(sourceDir, &TarOptions{
Compression: Uncompressed,
IncludeFiles: filter,
IncludeSourceDir: true,
RebaseNames: map[string]string{
sourceBase: rebaseName,
},
})
}
// CopyInfo holds basic info about the source
// or destination path of a copy operation.
type CopyInfo struct {
Path string
Exists bool
IsDir bool
RebaseName string
}
// CopyInfoSourcePath stats the given path to create a CopyInfo
// struct representing that resource for the source of an archive copy
// operation. The given path should be an absolute local path. A source path
// has all symlinks evaluated that appear before the last path separator ("/"
// on Unix). As it is to be a copy source, the path must exist.
func CopyInfoSourcePath(path string, followLink bool) (CopyInfo, error) {
// normalize the file path and then evaluate the symbol link
// we will use the target file instead of the symbol link if
// followLink is set
path = normalizePath(path)
resolvedPath, rebaseName, err := ResolveHostSourcePath(path, followLink)
if err != nil {
return CopyInfo{}, err
}
stat, err := os.Lstat(resolvedPath)
if err != nil {
return CopyInfo{}, err
}
return CopyInfo{
Path: resolvedPath,
Exists: true,
IsDir: stat.IsDir(),
RebaseName: rebaseName,
}, nil
}
// CopyInfoDestinationPath stats the given path to create a CopyInfo
// struct representing that resource for the destination of an archive copy
// operation. The given path should be an absolute local path.
func CopyInfoDestinationPath(path string) (info CopyInfo, err error) {
maxSymlinkIter := 10 // filepath.EvalSymlinks uses 255, but 10 already seems like a lot.
path = normalizePath(path)
originalPath := path
stat, err := os.Lstat(path)
if err == nil && stat.Mode()&os.ModeSymlink == 0 {
// The path exists and is not a symlink.
return CopyInfo{
Path: path,
Exists: true,
IsDir: stat.IsDir(),
}, nil
}
// While the path is a symlink.
for n := 0; err == nil && stat.Mode()&os.ModeSymlink != 0; n++ {
if n > maxSymlinkIter {
// Don't follow symlinks more than this arbitrary number of times.
return CopyInfo{}, errors.New("too many symlinks in " + originalPath)
}
// The path is a symbolic link. We need to evaluate it so that the
// destination of the copy operation is the link target and not the
// link itself. This is notably different than CopyInfoSourcePath which
// only evaluates symlinks before the last appearing path separator.
// Also note that it is okay if the last path element is a broken
// symlink as the copy operation should create the target.
var linkTarget string
linkTarget, err = os.Readlink(path)
if err != nil {
return CopyInfo{}, err
}
if !system.IsAbs(linkTarget) {
// Join with the parent directory.
dstParent, _ := SplitPathDirEntry(path)
linkTarget = filepath.Join(dstParent, linkTarget)
}
path = linkTarget
stat, err = os.Lstat(path)
}
if err != nil {
// It's okay if the destination path doesn't exist. We can still
// continue the copy operation if the parent directory exists.
if !os.IsNotExist(err) {
return CopyInfo{}, err
}
// Ensure destination parent dir exists.
dstParent, _ := SplitPathDirEntry(path)
parentDirStat, err := os.Lstat(dstParent)
if err != nil {
return CopyInfo{}, err
}
if !parentDirStat.IsDir() {
return CopyInfo{}, ErrNotDirectory
}
return CopyInfo{Path: path}, nil
}
// The path exists after resolving symlinks.
return CopyInfo{
Path: path,
Exists: true,
IsDir: stat.IsDir(),
}, nil
}
// PrepareArchiveCopy prepares the given srcContent archive, which should
// contain the archived resource described by srcInfo, to the destination
// described by dstInfo. Returns the possibly modified content archive along
// with the path to the destination directory which it should be extracted to.
func PrepareArchiveCopy(srcContent Reader, srcInfo, dstInfo CopyInfo) (dstDir string, content Archive, err error) {
// Ensure in platform semantics
srcInfo.Path = normalizePath(srcInfo.Path)
dstInfo.Path = normalizePath(dstInfo.Path)
// Separate the destination path between its directory and base
// components in case the source archive contents need to be rebased.
dstDir, dstBase := SplitPathDirEntry(dstInfo.Path)
_, srcBase := SplitPathDirEntry(srcInfo.Path)
switch {
case dstInfo.Exists && dstInfo.IsDir:
// The destination exists as a directory. No alteration
// to srcContent is needed as its contents can be
// simply extracted to the destination directory.
return dstInfo.Path, ioutil.NopCloser(srcContent), nil
case dstInfo.Exists && srcInfo.IsDir:
// The destination exists as some type of file and the source
// content is a directory. This is an error condition since
// you cannot copy a directory to an existing file location.
return "", nil, ErrCannotCopyDir
case dstInfo.Exists:
// The destination exists as some type of file and the source content
// is also a file. The source content entry will have to be renamed to
// have a basename which matches the destination path's basename.
if len(srcInfo.RebaseName) != 0 {
srcBase = srcInfo.RebaseName
}
return dstDir, RebaseArchiveEntries(srcContent, srcBase, dstBase), nil
case srcInfo.IsDir:
// The destination does not exist and the source content is an archive
// of a directory. The archive should be extracted to the parent of
// the destination path instead, and when it is, the directory that is
// created as a result should take the name of the destination path.
// The source content entries will have to be renamed to have a
// basename which matches the destination path's basename.
if len(srcInfo.RebaseName) != 0 {
srcBase = srcInfo.RebaseName
}
return dstDir, RebaseArchiveEntries(srcContent, srcBase, dstBase), nil
case assertsDirectory(dstInfo.Path):
// The destination does not exist and is asserted to be created as a
// directory, but the source content is not a directory. This is an
// error condition since you cannot create a directory from a file
// source.
return "", nil, ErrDirNotExists
default:
// The last remaining case is when the destination does not exist, is
// not asserted to be a directory, and the source content is not an
// archive of a directory. It this case, the destination file will need
// to be created when the archive is extracted and the source content
// entry will have to be renamed to have a basename which matches the
// destination path's basename.
if len(srcInfo.RebaseName) != 0 {
srcBase = srcInfo.RebaseName
}
return dstDir, RebaseArchiveEntries(srcContent, srcBase, dstBase), nil
}
}
// RebaseArchiveEntries rewrites the given srcContent archive replacing
// an occurrence of oldBase with newBase at the beginning of entry names.
func RebaseArchiveEntries(srcContent Reader, oldBase, newBase string) Archive {
if oldBase == string(os.PathSeparator) {
// If oldBase specifies the root directory, use an empty string as
// oldBase instead so that newBase doesn't replace the path separator
// that all paths will start with.
oldBase = ""
}
rebased, w := io.Pipe()
go func() {
srcTar := tar.NewReader(srcContent)
rebasedTar := tar.NewWriter(w)
for {
hdr, err := srcTar.Next()
if err == io.EOF {
// Signals end of archive.
rebasedTar.Close()
w.Close()
return
}
if err != nil {
w.CloseWithError(err)
return
}
hdr.Name = strings.Replace(hdr.Name, oldBase, newBase, 1)
if err = rebasedTar.WriteHeader(hdr); err != nil {
w.CloseWithError(err)
return
}
if _, err = io.Copy(rebasedTar, srcTar); err != nil {
w.CloseWithError(err)
return
}
}
}()
return rebased
}
// CopyResource performs an archive copy from the given source path to the
// given destination path. The source path MUST exist and the destination
// path's parent directory must exist.
func CopyResource(srcPath, dstPath string, followLink bool) error {
var (
srcInfo CopyInfo
err error
)
// Ensure in platform semantics
srcPath = normalizePath(srcPath)
dstPath = normalizePath(dstPath)
// Clean the source and destination paths.
srcPath = PreserveTrailingDotOrSeparator(filepath.Clean(srcPath), srcPath)
dstPath = PreserveTrailingDotOrSeparator(filepath.Clean(dstPath), dstPath)
if srcInfo, err = CopyInfoSourcePath(srcPath, followLink); err != nil {
return err
}
content, err := TarResource(srcInfo)
if err != nil {
return err
}
defer content.Close()
return CopyTo(content, srcInfo, dstPath)
}
// CopyTo handles extracting the given content whose
// entries should be sourced from srcInfo to dstPath.
func CopyTo(content Reader, srcInfo CopyInfo, dstPath string) error {
// The destination path need not exist, but CopyInfoDestinationPath will
// ensure that at least the parent directory exists.
dstInfo, err := CopyInfoDestinationPath(normalizePath(dstPath))
if err != nil {
return err
}
dstDir, copyArchive, err := PrepareArchiveCopy(content, srcInfo, dstInfo)
if err != nil {
return err
}
defer copyArchive.Close()
options := &TarOptions{
NoLchown: true,
NoOverwriteDirNonDir: true,
}
return Untar(copyArchive, dstDir, options)
}
// ResolveHostSourcePath decides real path need to be copied with parameters such as
// whether to follow symbol link or not, if followLink is true, resolvedPath will return
// link target of any symbol link file, else it will only resolve symlink of directory
// but return symbol link file itself without resolving.
func ResolveHostSourcePath(path string, followLink bool) (resolvedPath, rebaseName string, err error) {
if followLink {
resolvedPath, err = filepath.EvalSymlinks(path)
if err != nil {
return
}
resolvedPath, rebaseName = GetRebaseName(path, resolvedPath)
} else {
dirPath, basePath := filepath.Split(path)
// if not follow symbol link, then resolve symbol link of parent dir
var resolvedDirPath string
resolvedDirPath, err = filepath.EvalSymlinks(dirPath)
if err != nil {
return
}
// resolvedDirPath will have been cleaned (no trailing path separators) so
// we can manually join it with the base path element.
resolvedPath = resolvedDirPath + string(filepath.Separator) + basePath
if hasTrailingPathSeparator(path) && filepath.Base(path) != filepath.Base(resolvedPath) {
rebaseName = filepath.Base(path)
}
}
return resolvedPath, rebaseName, nil
}
// GetRebaseName normalizes and compares path and resolvedPath,
// return completed resolved path and rebased file name
func GetRebaseName(path, resolvedPath string) (string, string) {
// linkTarget will have been cleaned (no trailing path separators and dot) so
// we can manually join it with them
var rebaseName string
if specifiesCurrentDir(path) && !specifiesCurrentDir(resolvedPath) {
resolvedPath += string(filepath.Separator) + "."
}
if hasTrailingPathSeparator(path) && !hasTrailingPathSeparator(resolvedPath) {
resolvedPath += string(filepath.Separator)
}
if filepath.Base(path) != filepath.Base(resolvedPath) {
// In the case where the path had a trailing separator and a symlink
// evaluation has changed the last path component, we will need to
// rebase the name in the archive that is being copied to match the
// originally requested name.
rebaseName = filepath.Base(path)
}
return resolvedPath, rebaseName
}

View file

@ -0,0 +1,11 @@
// +build !windows
package archive
import (
"path/filepath"
)
func normalizePath(path string) string {
return filepath.ToSlash(path)
}

View file

@ -0,0 +1,9 @@
package archive
import (
"path/filepath"
)
func normalizePath(path string) string {
return filepath.FromSlash(path)
}

View file

@ -0,0 +1,279 @@
package archive
import (
"archive/tar"
"fmt"
"io"
"io/ioutil"
"os"
"path/filepath"
"runtime"
"strings"
"github.com/Sirupsen/logrus"
"github.com/docker/docker/pkg/idtools"
"github.com/docker/docker/pkg/pools"
"github.com/docker/docker/pkg/system"
)
// UnpackLayer unpack `layer` to a `dest`. The stream `layer` can be
// compressed or uncompressed.
// Returns the size in bytes of the contents of the layer.
func UnpackLayer(dest string, layer Reader, options *TarOptions) (size int64, err error) {
tr := tar.NewReader(layer)
trBuf := pools.BufioReader32KPool.Get(tr)
defer pools.BufioReader32KPool.Put(trBuf)
var dirs []*tar.Header
unpackedPaths := make(map[string]struct{})
if options == nil {
options = &TarOptions{}
}
if options.ExcludePatterns == nil {
options.ExcludePatterns = []string{}
}
remappedRootUID, remappedRootGID, err := idtools.GetRootUIDGID(options.UIDMaps, options.GIDMaps)
if err != nil {
return 0, err
}
aufsTempdir := ""
aufsHardlinks := make(map[string]*tar.Header)
if options == nil {
options = &TarOptions{}
}
// Iterate through the files in the archive.
for {
hdr, err := tr.Next()
if err == io.EOF {
// end of tar archive
break
}
if err != nil {
return 0, err
}
size += hdr.Size
// Normalize name, for safety and for a simple is-root check
hdr.Name = filepath.Clean(hdr.Name)
// Windows does not support filenames with colons in them. Ignore
// these files. This is not a problem though (although it might
// appear that it is). Let's suppose a client is running docker pull.
// The daemon it points to is Windows. Would it make sense for the
// client to be doing a docker pull Ubuntu for example (which has files
// with colons in the name under /usr/share/man/man3)? No, absolutely
// not as it would really only make sense that they were pulling a
// Windows image. However, for development, it is necessary to be able
// to pull Linux images which are in the repository.
//
// TODO Windows. Once the registry is aware of what images are Windows-
// specific or Linux-specific, this warning should be changed to an error
// to cater for the situation where someone does manage to upload a Linux
// image but have it tagged as Windows inadvertently.
if runtime.GOOS == "windows" {
if strings.Contains(hdr.Name, ":") {
logrus.Warnf("Windows: Ignoring %s (is this a Linux image?)", hdr.Name)
continue
}
}
// Note as these operations are platform specific, so must the slash be.
if !strings.HasSuffix(hdr.Name, string(os.PathSeparator)) {
// Not the root directory, ensure that the parent directory exists.
// This happened in some tests where an image had a tarfile without any
// parent directories.
parent := filepath.Dir(hdr.Name)
parentPath := filepath.Join(dest, parent)
if _, err := os.Lstat(parentPath); err != nil && os.IsNotExist(err) {
err = system.MkdirAll(parentPath, 0600)
if err != nil {
return 0, err
}
}
}
// Skip AUFS metadata dirs
if strings.HasPrefix(hdr.Name, WhiteoutMetaPrefix) {
// Regular files inside /.wh..wh.plnk can be used as hardlink targets
// We don't want this directory, but we need the files in them so that
// such hardlinks can be resolved.
if strings.HasPrefix(hdr.Name, WhiteoutLinkDir) && hdr.Typeflag == tar.TypeReg {
basename := filepath.Base(hdr.Name)
aufsHardlinks[basename] = hdr
if aufsTempdir == "" {
if aufsTempdir, err = ioutil.TempDir("", "dockerplnk"); err != nil {
return 0, err
}
defer os.RemoveAll(aufsTempdir)
}
if err := createTarFile(filepath.Join(aufsTempdir, basename), dest, hdr, tr, true, nil); err != nil {
return 0, err
}
}
if hdr.Name != WhiteoutOpaqueDir {
continue
}
}
path := filepath.Join(dest, hdr.Name)
rel, err := filepath.Rel(dest, path)
if err != nil {
return 0, err
}
// Note as these operations are platform specific, so must the slash be.
if strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {
return 0, breakoutError(fmt.Errorf("%q is outside of %q", hdr.Name, dest))
}
base := filepath.Base(path)
if strings.HasPrefix(base, WhiteoutPrefix) {
dir := filepath.Dir(path)
if base == WhiteoutOpaqueDir {
_, err := os.Lstat(dir)
if err != nil {
return 0, err
}
err = filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
if err != nil {
if os.IsNotExist(err) {
err = nil // parent was deleted
}
return err
}
if path == dir {
return nil
}
if _, exists := unpackedPaths[path]; !exists {
err := os.RemoveAll(path)
return err
}
return nil
})
if err != nil {
return 0, err
}
} else {
originalBase := base[len(WhiteoutPrefix):]
originalPath := filepath.Join(dir, originalBase)
if err := os.RemoveAll(originalPath); err != nil {
return 0, err
}
}
} else {
// If path exits we almost always just want to remove and replace it.
// The only exception is when it is a directory *and* the file from
// the layer is also a directory. Then we want to merge them (i.e.
// just apply the metadata from the layer).
if fi, err := os.Lstat(path); err == nil {
if !(fi.IsDir() && hdr.Typeflag == tar.TypeDir) {
if err := os.RemoveAll(path); err != nil {
return 0, err
}
}
}
trBuf.Reset(tr)
srcData := io.Reader(trBuf)
srcHdr := hdr
// Hard links into /.wh..wh.plnk don't work, as we don't extract that directory, so
// we manually retarget these into the temporary files we extracted them into
if hdr.Typeflag == tar.TypeLink && strings.HasPrefix(filepath.Clean(hdr.Linkname), WhiteoutLinkDir) {
linkBasename := filepath.Base(hdr.Linkname)
srcHdr = aufsHardlinks[linkBasename]
if srcHdr == nil {
return 0, fmt.Errorf("Invalid aufs hardlink")
}
tmpFile, err := os.Open(filepath.Join(aufsTempdir, linkBasename))
if err != nil {
return 0, err
}
defer tmpFile.Close()
srcData = tmpFile
}
// if the options contain a uid & gid maps, convert header uid/gid
// entries using the maps such that lchown sets the proper mapped
// uid/gid after writing the file. We only perform this mapping if
// the file isn't already owned by the remapped root UID or GID, as
// that specific uid/gid has no mapping from container -> host, and
// those files already have the proper ownership for inside the
// container.
if srcHdr.Uid != remappedRootUID {
xUID, err := idtools.ToHost(srcHdr.Uid, options.UIDMaps)
if err != nil {
return 0, err
}
srcHdr.Uid = xUID
}
if srcHdr.Gid != remappedRootGID {
xGID, err := idtools.ToHost(srcHdr.Gid, options.GIDMaps)
if err != nil {
return 0, err
}
srcHdr.Gid = xGID
}
if err := createTarFile(path, dest, srcHdr, srcData, true, nil); err != nil {
return 0, err
}
// Directory mtimes must be handled at the end to avoid further
// file creation in them to modify the directory mtime
if hdr.Typeflag == tar.TypeDir {
dirs = append(dirs, hdr)
}
unpackedPaths[path] = struct{}{}
}
}
for _, hdr := range dirs {
path := filepath.Join(dest, hdr.Name)
if err := system.Chtimes(path, hdr.AccessTime, hdr.ModTime); err != nil {
return 0, err
}
}
return size, nil
}
// ApplyLayer parses a diff in the standard layer format from `layer`,
// and applies it to the directory `dest`. The stream `layer` can be
// compressed or uncompressed.
// Returns the size in bytes of the contents of the layer.
func ApplyLayer(dest string, layer Reader) (int64, error) {
return applyLayerHandler(dest, layer, &TarOptions{}, true)
}
// ApplyUncompressedLayer parses a diff in the standard layer format from
// `layer`, and applies it to the directory `dest`. The stream `layer`
// can only be uncompressed.
// Returns the size in bytes of the contents of the layer.
func ApplyUncompressedLayer(dest string, layer Reader, options *TarOptions) (int64, error) {
return applyLayerHandler(dest, layer, options, false)
}
// do the bulk load of ApplyLayer, but allow for not calling DecompressStream
func applyLayerHandler(dest string, layer Reader, options *TarOptions, decompress bool) (int64, error) {
dest = filepath.Clean(dest)
// We need to be able to set any perms
oldmask, err := system.Umask(0)
if err != nil {
return 0, err
}
defer system.Umask(oldmask) // ignore err, ErrNotSupportedPlatform
if decompress {
layer, err = DecompressStream(layer)
if err != nil {
return 0, err
}
}
return UnpackLayer(dest, layer, options)
}

View file

@ -0,0 +1,97 @@
// +build ignore
// Simple tool to create an archive stream from an old and new directory
//
// By default it will stream the comparison of two temporary directories with junk files
package main
import (
"flag"
"fmt"
"io"
"io/ioutil"
"os"
"path"
"github.com/Sirupsen/logrus"
"github.com/docker/docker/pkg/archive"
)
var (
flDebug = flag.Bool("D", false, "debugging output")
flNewDir = flag.String("newdir", "", "")
flOldDir = flag.String("olddir", "", "")
log = logrus.New()
)
func main() {
flag.Usage = func() {
fmt.Println("Produce a tar from comparing two directory paths. By default a demo tar is created of around 200 files (including hardlinks)")
fmt.Printf("%s [OPTIONS]\n", os.Args[0])
flag.PrintDefaults()
}
flag.Parse()
log.Out = os.Stderr
if (len(os.Getenv("DEBUG")) > 0) || *flDebug {
logrus.SetLevel(logrus.DebugLevel)
}
var newDir, oldDir string
if len(*flNewDir) == 0 {
var err error
newDir, err = ioutil.TempDir("", "docker-test-newDir")
if err != nil {
log.Fatal(err)
}
defer os.RemoveAll(newDir)
if _, err := prepareUntarSourceDirectory(100, newDir, true); err != nil {
log.Fatal(err)
}
} else {
newDir = *flNewDir
}
if len(*flOldDir) == 0 {
oldDir, err := ioutil.TempDir("", "docker-test-oldDir")
if err != nil {
log.Fatal(err)
}
defer os.RemoveAll(oldDir)
} else {
oldDir = *flOldDir
}
changes, err := archive.ChangesDirs(newDir, oldDir)
if err != nil {
log.Fatal(err)
}
a, err := archive.ExportChanges(newDir, changes)
if err != nil {
log.Fatal(err)
}
defer a.Close()
i, err := io.Copy(os.Stdout, a)
if err != nil && err != io.EOF {
log.Fatal(err)
}
fmt.Fprintf(os.Stderr, "wrote archive of %d bytes", i)
}
func prepareUntarSourceDirectory(numberOfFiles int, targetPath string, makeLinks bool) (int, error) {
fileData := []byte("fooo")
for n := 0; n < numberOfFiles; n++ {
fileName := fmt.Sprintf("file-%d", n)
if err := ioutil.WriteFile(path.Join(targetPath, fileName), fileData, 0700); err != nil {
return 0, err
}
if makeLinks {
if err := os.Link(path.Join(targetPath, fileName), path.Join(targetPath, fileName+"-link")); err != nil {
return 0, err
}
}
}
totalSize := numberOfFiles * len(fileData)
return totalSize, nil
}

View file

@ -0,0 +1,16 @@
package archive
import (
"syscall"
"time"
)
func timeToTimespec(time time.Time) (ts syscall.Timespec) {
if time.IsZero() {
// Return UTIME_OMIT special value
ts.Sec = 0
ts.Nsec = ((1 << 30) - 2)
return
}
return syscall.NsecToTimespec(time.UnixNano())
}

View file

@ -0,0 +1,16 @@
// +build !linux
package archive
import (
"syscall"
"time"
)
func timeToTimespec(time time.Time) (ts syscall.Timespec) {
nsec := int64(0)
if !time.IsZero() {
nsec = time.UnixNano()
}
return syscall.NsecToTimespec(nsec)
}

View file

@ -0,0 +1,23 @@
package archive
// Whiteouts are files with a special meaning for the layered filesystem.
// Docker uses AUFS whiteout files inside exported archives. In other
// filesystems these files are generated/handled on tar creation/extraction.
// WhiteoutPrefix prefix means file is a whiteout. If this is followed by a
// filename this means that file has been removed from the base layer.
const WhiteoutPrefix = ".wh."
// WhiteoutMetaPrefix prefix means whiteout has a special meaning and is not
// for removing an actual file. Normally these files are excluded from exported
// archives.
const WhiteoutMetaPrefix = WhiteoutPrefix + WhiteoutPrefix
// WhiteoutLinkDir is a directory AUFS uses for storing hardlink links to other
// layers. Normally these should not go into exported archives and all changed
// hardlinks should be copied to the top layer.
const WhiteoutLinkDir = WhiteoutMetaPrefix + "plnk"
// WhiteoutOpaqueDir file means directory has been made opaque - meaning
// readdir calls to this directory do not follow to lower layers.
const WhiteoutOpaqueDir = WhiteoutMetaPrefix + ".opq"

View file

@ -0,0 +1,59 @@
package archive
import (
"archive/tar"
"bytes"
"io/ioutil"
)
// Generate generates a new archive from the content provided
// as input.
//
// `files` is a sequence of path/content pairs. A new file is
// added to the archive for each pair.
// If the last pair is incomplete, the file is created with an
// empty content. For example:
//
// Generate("foo.txt", "hello world", "emptyfile")
//
// The above call will return an archive with 2 files:
// * ./foo.txt with content "hello world"
// * ./empty with empty content
//
// FIXME: stream content instead of buffering
// FIXME: specify permissions and other archive metadata
func Generate(input ...string) (Archive, error) {
files := parseStringPairs(input...)
buf := new(bytes.Buffer)
tw := tar.NewWriter(buf)
for _, file := range files {
name, content := file[0], file[1]
hdr := &tar.Header{
Name: name,
Size: int64(len(content)),
}
if err := tw.WriteHeader(hdr); err != nil {
return nil, err
}
if _, err := tw.Write([]byte(content)); err != nil {
return nil, err
}
}
if err := tw.Close(); err != nil {
return nil, err
}
return ioutil.NopCloser(buf), nil
}
func parseStringPairs(input ...string) (output [][2]string) {
output = make([][2]string, 0, len(input)/2+1)
for i := 0; i < len(input); i += 2 {
var pair [2]string
pair[0] = input[i]
if i+1 < len(input) {
pair[1] = input[i+1]
}
output = append(output, pair)
}
return
}

View file

@ -0,0 +1,283 @@
package fileutils
import (
"errors"
"fmt"
"io"
"os"
"path/filepath"
"regexp"
"strings"
"text/scanner"
"github.com/Sirupsen/logrus"
)
// exclusion returns true if the specified pattern is an exclusion
func exclusion(pattern string) bool {
return pattern[0] == '!'
}
// empty returns true if the specified pattern is empty
func empty(pattern string) bool {
return pattern == ""
}
// CleanPatterns takes a slice of patterns returns a new
// slice of patterns cleaned with filepath.Clean, stripped
// of any empty patterns and lets the caller know whether the
// slice contains any exception patterns (prefixed with !).
func CleanPatterns(patterns []string) ([]string, [][]string, bool, error) {
// Loop over exclusion patterns and:
// 1. Clean them up.
// 2. Indicate whether we are dealing with any exception rules.
// 3. Error if we see a single exclusion marker on its own (!).
cleanedPatterns := []string{}
patternDirs := [][]string{}
exceptions := false
for _, pattern := range patterns {
// Eliminate leading and trailing whitespace.
pattern = strings.TrimSpace(pattern)
if empty(pattern) {
continue
}
if exclusion(pattern) {
if len(pattern) == 1 {
return nil, nil, false, errors.New("Illegal exclusion pattern: !")
}
exceptions = true
}
pattern = filepath.Clean(pattern)
cleanedPatterns = append(cleanedPatterns, pattern)
if exclusion(pattern) {
pattern = pattern[1:]
}
patternDirs = append(patternDirs, strings.Split(pattern, string(os.PathSeparator)))
}
return cleanedPatterns, patternDirs, exceptions, nil
}
// Matches returns true if file matches any of the patterns
// and isn't excluded by any of the subsequent patterns.
func Matches(file string, patterns []string) (bool, error) {
file = filepath.Clean(file)
if file == "." {
// Don't let them exclude everything, kind of silly.
return false, nil
}
patterns, patDirs, _, err := CleanPatterns(patterns)
if err != nil {
return false, err
}
return OptimizedMatches(file, patterns, patDirs)
}
// OptimizedMatches is basically the same as fileutils.Matches() but optimized for archive.go.
// It will assume that the inputs have been preprocessed and therefore the function
// doesn't need to do as much error checking and clean-up. This was done to avoid
// repeating these steps on each file being checked during the archive process.
// The more generic fileutils.Matches() can't make these assumptions.
func OptimizedMatches(file string, patterns []string, patDirs [][]string) (bool, error) {
matched := false
file = filepath.FromSlash(file)
parentPath := filepath.Dir(file)
parentPathDirs := strings.Split(parentPath, string(os.PathSeparator))
for i, pattern := range patterns {
negative := false
if exclusion(pattern) {
negative = true
pattern = pattern[1:]
}
match, err := regexpMatch(pattern, file)
if err != nil {
return false, fmt.Errorf("Error in pattern (%s): %s", pattern, err)
}
if !match && parentPath != "." {
// Check to see if the pattern matches one of our parent dirs.
if len(patDirs[i]) <= len(parentPathDirs) {
match, _ = regexpMatch(strings.Join(patDirs[i], string(os.PathSeparator)),
strings.Join(parentPathDirs[:len(patDirs[i])], string(os.PathSeparator)))
}
}
if match {
matched = !negative
}
}
if matched {
logrus.Debugf("Skipping excluded path: %s", file)
}
return matched, nil
}
// regexpMatch tries to match the logic of filepath.Match but
// does so using regexp logic. We do this so that we can expand the
// wildcard set to include other things, like "**" to mean any number
// of directories. This means that we should be backwards compatible
// with filepath.Match(). We'll end up supporting more stuff, due to
// the fact that we're using regexp, but that's ok - it does no harm.
//
// As per the comment in golangs filepath.Match, on Windows, escaping
// is disabled. Instead, '\\' is treated as path separator.
func regexpMatch(pattern, path string) (bool, error) {
regStr := "^"
// Do some syntax checking on the pattern.
// filepath's Match() has some really weird rules that are inconsistent
// so instead of trying to dup their logic, just call Match() for its
// error state and if there is an error in the pattern return it.
// If this becomes an issue we can remove this since its really only
// needed in the error (syntax) case - which isn't really critical.
if _, err := filepath.Match(pattern, path); err != nil {
return false, err
}
// Go through the pattern and convert it to a regexp.
// We use a scanner so we can support utf-8 chars.
var scan scanner.Scanner
scan.Init(strings.NewReader(pattern))
sl := string(os.PathSeparator)
escSL := sl
if sl == `\` {
escSL += `\`
}
for scan.Peek() != scanner.EOF {
ch := scan.Next()
if ch == '*' {
if scan.Peek() == '*' {
// is some flavor of "**"
scan.Next()
if scan.Peek() == scanner.EOF {
// is "**EOF" - to align with .gitignore just accept all
regStr += ".*"
} else {
// is "**"
regStr += "((.*" + escSL + ")|([^" + escSL + "]*))"
}
// Treat **/ as ** so eat the "/"
if string(scan.Peek()) == sl {
scan.Next()
}
} else {
// is "*" so map it to anything but "/"
regStr += "[^" + escSL + "]*"
}
} else if ch == '?' {
// "?" is any char except "/"
regStr += "[^" + escSL + "]"
} else if strings.Index(".$", string(ch)) != -1 {
// Escape some regexp special chars that have no meaning
// in golang's filepath.Match
regStr += `\` + string(ch)
} else if ch == '\\' {
// escape next char. Note that a trailing \ in the pattern
// will be left alone (but need to escape it)
if sl == `\` {
// On windows map "\" to "\\", meaning an escaped backslash,
// and then just continue because filepath.Match on
// Windows doesn't allow escaping at all
regStr += escSL
continue
}
if scan.Peek() != scanner.EOF {
regStr += `\` + string(scan.Next())
} else {
regStr += `\`
}
} else {
regStr += string(ch)
}
}
regStr += "$"
res, err := regexp.MatchString(regStr, path)
// Map regexp's error to filepath's so no one knows we're not using filepath
if err != nil {
err = filepath.ErrBadPattern
}
return res, err
}
// CopyFile copies from src to dst until either EOF is reached
// on src or an error occurs. It verifies src exists and removes
// the dst if it exists.
func CopyFile(src, dst string) (int64, error) {
cleanSrc := filepath.Clean(src)
cleanDst := filepath.Clean(dst)
if cleanSrc == cleanDst {
return 0, nil
}
sf, err := os.Open(cleanSrc)
if err != nil {
return 0, err
}
defer sf.Close()
if err := os.Remove(cleanDst); err != nil && !os.IsNotExist(err) {
return 0, err
}
df, err := os.Create(cleanDst)
if err != nil {
return 0, err
}
defer df.Close()
return io.Copy(df, sf)
}
// ReadSymlinkedDirectory returns the target directory of a symlink.
// The target of the symbolic link may not be a file.
func ReadSymlinkedDirectory(path string) (string, error) {
var realPath string
var err error
if realPath, err = filepath.Abs(path); err != nil {
return "", fmt.Errorf("unable to get absolute path for %s: %s", path, err)
}
if realPath, err = filepath.EvalSymlinks(realPath); err != nil {
return "", fmt.Errorf("failed to canonicalise path for %s: %s", path, err)
}
realPathInfo, err := os.Stat(realPath)
if err != nil {
return "", fmt.Errorf("failed to stat target '%s' of '%s': %s", realPath, path, err)
}
if !realPathInfo.Mode().IsDir() {
return "", fmt.Errorf("canonical path points to a file '%s'", realPath)
}
return realPath, nil
}
// CreateIfNotExists creates a file or a directory only if it does not already exist.
func CreateIfNotExists(path string, isDir bool) error {
if _, err := os.Stat(path); err != nil {
if os.IsNotExist(err) {
if isDir {
return os.MkdirAll(path, 0755)
}
if err := os.MkdirAll(filepath.Dir(path), 0755); err != nil {
return err
}
f, err := os.OpenFile(path, os.O_CREATE, 0755)
if err != nil {
return err
}
f.Close()
}
}
return nil
}

View file

@ -0,0 +1,27 @@
package fileutils
import (
"os"
"os/exec"
"strconv"
"strings"
)
// GetTotalUsedFds returns the number of used File Descriptors by
// executing `lsof -p PID`
func GetTotalUsedFds() int {
pid := os.Getpid()
cmd := exec.Command("lsof", "-p", strconv.Itoa(pid))
output, err := cmd.CombinedOutput()
if err != nil {
return -1
}
outputStr := strings.TrimSpace(string(output))
fds := strings.Split(outputStr, "\n")
return len(fds) - 1
}

View file

@ -0,0 +1,7 @@
package fileutils
// GetTotalUsedFds Returns the number of used File Descriptors.
// On Solaris these limits are per process and not systemwide
func GetTotalUsedFds() int {
return -1
}

View file

@ -0,0 +1,22 @@
// +build linux freebsd
package fileutils
import (
"fmt"
"io/ioutil"
"os"
"github.com/Sirupsen/logrus"
)
// GetTotalUsedFds Returns the number of used File Descriptors by
// reading it via /proc filesystem.
func GetTotalUsedFds() int {
if fds, err := ioutil.ReadDir(fmt.Sprintf("/proc/%d/fd", os.Getpid())); err != nil {
logrus.Errorf("Error opening /proc/%d/fd: %s", os.Getpid(), err)
} else {
return len(fds)
}
return -1
}

View file

@ -0,0 +1,7 @@
package fileutils
// GetTotalUsedFds Returns the number of used File Descriptors. Not supported
// on Windows.
func GetTotalUsedFds() int {
return -1
}

View file

@ -0,0 +1,39 @@
package homedir
import (
"os"
"runtime"
"github.com/opencontainers/runc/libcontainer/user"
)
// Key returns the env var name for the user's home dir based on
// the platform being run on
func Key() string {
if runtime.GOOS == "windows" {
return "USERPROFILE"
}
return "HOME"
}
// Get returns the home directory of the current user with the help of
// environment variables depending on the target operating system.
// Returned path should be used with "path/filepath" to form new paths.
func Get() string {
home := os.Getenv(Key())
if home == "" && runtime.GOOS != "windows" {
if u, err := user.CurrentUser(); err == nil {
return u.Home
}
}
return home
}
// GetShortcutString returns the string that is shortcut to user's home directory
// in the native shell of the platform running on.
func GetShortcutString() string {
if runtime.GOOS == "windows" {
return "%USERPROFILE%" // be careful while using in format functions
}
return "~"
}

View file

@ -0,0 +1,197 @@
package idtools
import (
"bufio"
"fmt"
"os"
"sort"
"strconv"
"strings"
)
// IDMap contains a single entry for user namespace range remapping. An array
// of IDMap entries represents the structure that will be provided to the Linux
// kernel for creating a user namespace.
type IDMap struct {
ContainerID int `json:"container_id"`
HostID int `json:"host_id"`
Size int `json:"size"`
}
type subIDRange struct {
Start int
Length int
}
type ranges []subIDRange
func (e ranges) Len() int { return len(e) }
func (e ranges) Swap(i, j int) { e[i], e[j] = e[j], e[i] }
func (e ranges) Less(i, j int) bool { return e[i].Start < e[j].Start }
const (
subuidFileName string = "/etc/subuid"
subgidFileName string = "/etc/subgid"
)
// MkdirAllAs creates a directory (include any along the path) and then modifies
// ownership to the requested uid/gid. If the directory already exists, this
// function will still change ownership to the requested uid/gid pair.
func MkdirAllAs(path string, mode os.FileMode, ownerUID, ownerGID int) error {
return mkdirAs(path, mode, ownerUID, ownerGID, true, true)
}
// MkdirAllNewAs creates a directory (include any along the path) and then modifies
// ownership ONLY of newly created directories to the requested uid/gid. If the
// directories along the path exist, no change of ownership will be performed
func MkdirAllNewAs(path string, mode os.FileMode, ownerUID, ownerGID int) error {
return mkdirAs(path, mode, ownerUID, ownerGID, true, false)
}
// MkdirAs creates a directory and then modifies ownership to the requested uid/gid.
// If the directory already exists, this function still changes ownership
func MkdirAs(path string, mode os.FileMode, ownerUID, ownerGID int) error {
return mkdirAs(path, mode, ownerUID, ownerGID, false, true)
}
// GetRootUIDGID retrieves the remapped root uid/gid pair from the set of maps.
// If the maps are empty, then the root uid/gid will default to "real" 0/0
func GetRootUIDGID(uidMap, gidMap []IDMap) (int, int, error) {
var uid, gid int
if uidMap != nil {
xUID, err := ToHost(0, uidMap)
if err != nil {
return -1, -1, err
}
uid = xUID
}
if gidMap != nil {
xGID, err := ToHost(0, gidMap)
if err != nil {
return -1, -1, err
}
gid = xGID
}
return uid, gid, nil
}
// ToContainer takes an id mapping, and uses it to translate a
// host ID to the remapped ID. If no map is provided, then the translation
// assumes a 1-to-1 mapping and returns the passed in id
func ToContainer(hostID int, idMap []IDMap) (int, error) {
if idMap == nil {
return hostID, nil
}
for _, m := range idMap {
if (hostID >= m.HostID) && (hostID <= (m.HostID + m.Size - 1)) {
contID := m.ContainerID + (hostID - m.HostID)
return contID, nil
}
}
return -1, fmt.Errorf("Host ID %d cannot be mapped to a container ID", hostID)
}
// ToHost takes an id mapping and a remapped ID, and translates the
// ID to the mapped host ID. If no map is provided, then the translation
// assumes a 1-to-1 mapping and returns the passed in id #
func ToHost(contID int, idMap []IDMap) (int, error) {
if idMap == nil {
return contID, nil
}
for _, m := range idMap {
if (contID >= m.ContainerID) && (contID <= (m.ContainerID + m.Size - 1)) {
hostID := m.HostID + (contID - m.ContainerID)
return hostID, nil
}
}
return -1, fmt.Errorf("Container ID %d cannot be mapped to a host ID", contID)
}
// CreateIDMappings takes a requested user and group name and
// using the data from /etc/sub{uid,gid} ranges, creates the
// proper uid and gid remapping ranges for that user/group pair
func CreateIDMappings(username, groupname string) ([]IDMap, []IDMap, error) {
subuidRanges, err := parseSubuid(username)
if err != nil {
return nil, nil, err
}
subgidRanges, err := parseSubgid(groupname)
if err != nil {
return nil, nil, err
}
if len(subuidRanges) == 0 {
return nil, nil, fmt.Errorf("No subuid ranges found for user %q", username)
}
if len(subgidRanges) == 0 {
return nil, nil, fmt.Errorf("No subgid ranges found for group %q", groupname)
}
return createIDMap(subuidRanges), createIDMap(subgidRanges), nil
}
func createIDMap(subidRanges ranges) []IDMap {
idMap := []IDMap{}
// sort the ranges by lowest ID first
sort.Sort(subidRanges)
containerID := 0
for _, idrange := range subidRanges {
idMap = append(idMap, IDMap{
ContainerID: containerID,
HostID: idrange.Start,
Size: idrange.Length,
})
containerID = containerID + idrange.Length
}
return idMap
}
func parseSubuid(username string) (ranges, error) {
return parseSubidFile(subuidFileName, username)
}
func parseSubgid(username string) (ranges, error) {
return parseSubidFile(subgidFileName, username)
}
// parseSubidFile will read the appropriate file (/etc/subuid or /etc/subgid)
// and return all found ranges for a specified username. If the special value
// "ALL" is supplied for username, then all ranges in the file will be returned
func parseSubidFile(path, username string) (ranges, error) {
var rangeList ranges
subidFile, err := os.Open(path)
if err != nil {
return rangeList, err
}
defer subidFile.Close()
s := bufio.NewScanner(subidFile)
for s.Scan() {
if err := s.Err(); err != nil {
return rangeList, err
}
text := strings.TrimSpace(s.Text())
if text == "" || strings.HasPrefix(text, "#") {
continue
}
parts := strings.Split(text, ":")
if len(parts) != 3 {
return rangeList, fmt.Errorf("Cannot parse subuid/gid information: Format not correct for %s file", path)
}
if parts[0] == username || username == "ALL" {
startid, err := strconv.Atoi(parts[1])
if err != nil {
return rangeList, fmt.Errorf("String to int conversion failed during subuid/gid parsing of %s: %v", path, err)
}
length, err := strconv.Atoi(parts[2])
if err != nil {
return rangeList, fmt.Errorf("String to int conversion failed during subuid/gid parsing of %s: %v", path, err)
}
rangeList = append(rangeList, subIDRange{startid, length})
}
}
return rangeList, nil
}

View file

@ -0,0 +1,60 @@
// +build !windows
package idtools
import (
"os"
"path/filepath"
"github.com/docker/docker/pkg/system"
)
func mkdirAs(path string, mode os.FileMode, ownerUID, ownerGID int, mkAll, chownExisting bool) error {
// make an array containing the original path asked for, plus (for mkAll == true)
// all path components leading up to the complete path that don't exist before we MkdirAll
// so that we can chown all of them properly at the end. If chownExisting is false, we won't
// chown the full directory path if it exists
var paths []string
if _, err := os.Stat(path); err != nil && os.IsNotExist(err) {
paths = []string{path}
} else if err == nil && chownExisting {
if err := os.Chown(path, ownerUID, ownerGID); err != nil {
return err
}
// short-circuit--we were called with an existing directory and chown was requested
return nil
} else if err == nil {
// nothing to do; directory path fully exists already and chown was NOT requested
return nil
}
if mkAll {
// walk back to "/" looking for directories which do not exist
// and add them to the paths array for chown after creation
dirPath := path
for {
dirPath = filepath.Dir(dirPath)
if dirPath == "/" {
break
}
if _, err := os.Stat(dirPath); err != nil && os.IsNotExist(err) {
paths = append(paths, dirPath)
}
}
if err := system.MkdirAll(path, mode); err != nil && !os.IsExist(err) {
return err
}
} else {
if err := os.Mkdir(path, mode); err != nil && !os.IsExist(err) {
return err
}
}
// even if it existed, we will chown the requested path + any subpaths that
// didn't exist when we called MkdirAll
for _, pathComponent := range paths {
if err := os.Chown(pathComponent, ownerUID, ownerGID); err != nil {
return err
}
}
return nil
}

View file

@ -0,0 +1,18 @@
// +build windows
package idtools
import (
"os"
"github.com/docker/docker/pkg/system"
)
// Platforms such as Windows do not support the UID/GID concept. So make this
// just a wrapper around system.MkdirAll.
func mkdirAs(path string, mode os.FileMode, ownerUID, ownerGID int, mkAll, chownExisting bool) error {
if err := system.MkdirAll(path, mode); err != nil && !os.IsExist(err) {
return err
}
return nil
}

View file

@ -0,0 +1,188 @@
package idtools
import (
"fmt"
"os/exec"
"path/filepath"
"regexp"
"sort"
"strconv"
"strings"
"sync"
)
// add a user and/or group to Linux /etc/passwd, /etc/group using standard
// Linux distribution commands:
// adduser --system --shell /bin/false --disabled-login --disabled-password --no-create-home --group <username>
// useradd -r -s /bin/false <username>
var (
once sync.Once
userCommand string
cmdTemplates = map[string]string{
"adduser": "--system --shell /bin/false --no-create-home --disabled-login --disabled-password --group %s",
"useradd": "-r -s /bin/false %s",
"usermod": "-%s %d-%d %s",
}
idOutRegexp = regexp.MustCompile(`uid=([0-9]+).*gid=([0-9]+)`)
// default length for a UID/GID subordinate range
defaultRangeLen = 65536
defaultRangeStart = 100000
userMod = "usermod"
)
func resolveBinary(binname string) (string, error) {
binaryPath, err := exec.LookPath(binname)
if err != nil {
return "", err
}
resolvedPath, err := filepath.EvalSymlinks(binaryPath)
if err != nil {
return "", err
}
//only return no error if the final resolved binary basename
//matches what was searched for
if filepath.Base(resolvedPath) == binname {
return resolvedPath, nil
}
return "", fmt.Errorf("Binary %q does not resolve to a binary of that name in $PATH (%q)", binname, resolvedPath)
}
// AddNamespaceRangesUser takes a username and uses the standard system
// utility to create a system user/group pair used to hold the
// /etc/sub{uid,gid} ranges which will be used for user namespace
// mapping ranges in containers.
func AddNamespaceRangesUser(name string) (int, int, error) {
if err := addUser(name); err != nil {
return -1, -1, fmt.Errorf("Error adding user %q: %v", name, err)
}
// Query the system for the created uid and gid pair
out, err := execCmd("id", name)
if err != nil {
return -1, -1, fmt.Errorf("Error trying to find uid/gid for new user %q: %v", name, err)
}
matches := idOutRegexp.FindStringSubmatch(strings.TrimSpace(string(out)))
if len(matches) != 3 {
return -1, -1, fmt.Errorf("Can't find uid, gid from `id` output: %q", string(out))
}
uid, err := strconv.Atoi(matches[1])
if err != nil {
return -1, -1, fmt.Errorf("Can't convert found uid (%s) to int: %v", matches[1], err)
}
gid, err := strconv.Atoi(matches[2])
if err != nil {
return -1, -1, fmt.Errorf("Can't convert found gid (%s) to int: %v", matches[2], err)
}
// Now we need to create the subuid/subgid ranges for our new user/group (system users
// do not get auto-created ranges in subuid/subgid)
if err := createSubordinateRanges(name); err != nil {
return -1, -1, fmt.Errorf("Couldn't create subordinate ID ranges: %v", err)
}
return uid, gid, nil
}
func addUser(userName string) error {
once.Do(func() {
// set up which commands are used for adding users/groups dependent on distro
if _, err := resolveBinary("adduser"); err == nil {
userCommand = "adduser"
} else if _, err := resolveBinary("useradd"); err == nil {
userCommand = "useradd"
}
})
if userCommand == "" {
return fmt.Errorf("Cannot add user; no useradd/adduser binary found")
}
args := fmt.Sprintf(cmdTemplates[userCommand], userName)
out, err := execCmd(userCommand, args)
if err != nil {
return fmt.Errorf("Failed to add user with error: %v; output: %q", err, string(out))
}
return nil
}
func createSubordinateRanges(name string) error {
// first, we should verify that ranges weren't automatically created
// by the distro tooling
ranges, err := parseSubuid(name)
if err != nil {
return fmt.Errorf("Error while looking for subuid ranges for user %q: %v", name, err)
}
if len(ranges) == 0 {
// no UID ranges; let's create one
startID, err := findNextUIDRange()
if err != nil {
return fmt.Errorf("Can't find available subuid range: %v", err)
}
out, err := execCmd(userMod, fmt.Sprintf(cmdTemplates[userMod], "v", startID, startID+defaultRangeLen-1, name))
if err != nil {
return fmt.Errorf("Unable to add subuid range to user: %q; output: %s, err: %v", name, out, err)
}
}
ranges, err = parseSubgid(name)
if err != nil {
return fmt.Errorf("Error while looking for subgid ranges for user %q: %v", name, err)
}
if len(ranges) == 0 {
// no GID ranges; let's create one
startID, err := findNextGIDRange()
if err != nil {
return fmt.Errorf("Can't find available subgid range: %v", err)
}
out, err := execCmd(userMod, fmt.Sprintf(cmdTemplates[userMod], "w", startID, startID+defaultRangeLen-1, name))
if err != nil {
return fmt.Errorf("Unable to add subgid range to user: %q; output: %s, err: %v", name, out, err)
}
}
return nil
}
func findNextUIDRange() (int, error) {
ranges, err := parseSubuid("ALL")
if err != nil {
return -1, fmt.Errorf("Couldn't parse all ranges in /etc/subuid file: %v", err)
}
sort.Sort(ranges)
return findNextRangeStart(ranges)
}
func findNextGIDRange() (int, error) {
ranges, err := parseSubgid("ALL")
if err != nil {
return -1, fmt.Errorf("Couldn't parse all ranges in /etc/subgid file: %v", err)
}
sort.Sort(ranges)
return findNextRangeStart(ranges)
}
func findNextRangeStart(rangeList ranges) (int, error) {
startID := defaultRangeStart
for _, arange := range rangeList {
if wouldOverlap(arange, startID) {
startID = arange.Start + arange.Length
}
}
return startID, nil
}
func wouldOverlap(arange subIDRange, ID int) bool {
low := ID
high := ID + defaultRangeLen
if (low >= arange.Start && low <= arange.Start+arange.Length) ||
(high <= arange.Start+arange.Length && high >= arange.Start) {
return true
}
return false
}
func execCmd(cmd, args string) ([]byte, error) {
execCmd := exec.Command(cmd, strings.Split(args, " ")...)
return execCmd.CombinedOutput()
}

View file

@ -0,0 +1,12 @@
// +build !linux
package idtools
import "fmt"
// AddNamespaceRangesUser takes a name and finds an unused uid, gid pair
// and calls the appropriate helper function to add the group and then
// the user to the group in /etc/group and /etc/passwd respectively.
func AddNamespaceRangesUser(name string) (int, int, error) {
return -1, -1, fmt.Errorf("No support for adding users or groups on this OS")
}

View file

@ -0,0 +1,51 @@
package ioutils
import (
"errors"
"io"
)
var errBufferFull = errors.New("buffer is full")
type fixedBuffer struct {
buf []byte
pos int
lastRead int
}
func (b *fixedBuffer) Write(p []byte) (int, error) {
n := copy(b.buf[b.pos:cap(b.buf)], p)
b.pos += n
if n < len(p) {
if b.pos == cap(b.buf) {
return n, errBufferFull
}
return n, io.ErrShortWrite
}
return n, nil
}
func (b *fixedBuffer) Read(p []byte) (int, error) {
n := copy(p, b.buf[b.lastRead:b.pos])
b.lastRead += n
return n, nil
}
func (b *fixedBuffer) Len() int {
return b.pos - b.lastRead
}
func (b *fixedBuffer) Cap() int {
return cap(b.buf)
}
func (b *fixedBuffer) Reset() {
b.pos = 0
b.lastRead = 0
b.buf = b.buf[:0]
}
func (b *fixedBuffer) String() string {
return string(b.buf[b.lastRead:b.pos])
}

View file

@ -0,0 +1,186 @@
package ioutils
import (
"errors"
"io"
"sync"
)
// maxCap is the highest capacity to use in byte slices that buffer data.
const maxCap = 1e6
// minCap is the lowest capacity to use in byte slices that buffer data
const minCap = 64
// blockThreshold is the minimum number of bytes in the buffer which will cause
// a write to BytesPipe to block when allocating a new slice.
const blockThreshold = 1e6
var (
// ErrClosed is returned when Write is called on a closed BytesPipe.
ErrClosed = errors.New("write to closed BytesPipe")
bufPools = make(map[int]*sync.Pool)
bufPoolsLock sync.Mutex
)
// BytesPipe is io.ReadWriteCloser which works similarly to pipe(queue).
// All written data may be read at most once. Also, BytesPipe allocates
// and releases new byte slices to adjust to current needs, so the buffer
// won't be overgrown after peak loads.
type BytesPipe struct {
mu sync.Mutex
wait *sync.Cond
buf []*fixedBuffer
bufLen int
closeErr error // error to return from next Read. set to nil if not closed.
}
// NewBytesPipe creates new BytesPipe, initialized by specified slice.
// If buf is nil, then it will be initialized with slice which cap is 64.
// buf will be adjusted in a way that len(buf) == 0, cap(buf) == cap(buf).
func NewBytesPipe() *BytesPipe {
bp := &BytesPipe{}
bp.buf = append(bp.buf, getBuffer(minCap))
bp.wait = sync.NewCond(&bp.mu)
return bp
}
// Write writes p to BytesPipe.
// It can allocate new []byte slices in a process of writing.
func (bp *BytesPipe) Write(p []byte) (int, error) {
bp.mu.Lock()
written := 0
loop0:
for {
if bp.closeErr != nil {
bp.mu.Unlock()
return written, ErrClosed
}
if len(bp.buf) == 0 {
bp.buf = append(bp.buf, getBuffer(64))
}
// get the last buffer
b := bp.buf[len(bp.buf)-1]
n, err := b.Write(p)
written += n
bp.bufLen += n
// errBufferFull is an error we expect to get if the buffer is full
if err != nil && err != errBufferFull {
bp.wait.Broadcast()
bp.mu.Unlock()
return written, err
}
// if there was enough room to write all then break
if len(p) == n {
break
}
// more data: write to the next slice
p = p[n:]
// make sure the buffer doesn't grow too big from this write
for bp.bufLen >= blockThreshold {
bp.wait.Wait()
if bp.closeErr != nil {
continue loop0
}
}
// add new byte slice to the buffers slice and continue writing
nextCap := b.Cap() * 2
if nextCap > maxCap {
nextCap = maxCap
}
bp.buf = append(bp.buf, getBuffer(nextCap))
}
bp.wait.Broadcast()
bp.mu.Unlock()
return written, nil
}
// CloseWithError causes further reads from a BytesPipe to return immediately.
func (bp *BytesPipe) CloseWithError(err error) error {
bp.mu.Lock()
if err != nil {
bp.closeErr = err
} else {
bp.closeErr = io.EOF
}
bp.wait.Broadcast()
bp.mu.Unlock()
return nil
}
// Close causes further reads from a BytesPipe to return immediately.
func (bp *BytesPipe) Close() error {
return bp.CloseWithError(nil)
}
// Read reads bytes from BytesPipe.
// Data could be read only once.
func (bp *BytesPipe) Read(p []byte) (n int, err error) {
bp.mu.Lock()
if bp.bufLen == 0 {
if bp.closeErr != nil {
bp.mu.Unlock()
return 0, bp.closeErr
}
bp.wait.Wait()
if bp.bufLen == 0 && bp.closeErr != nil {
err := bp.closeErr
bp.mu.Unlock()
return 0, err
}
}
for bp.bufLen > 0 {
b := bp.buf[0]
read, _ := b.Read(p) // ignore error since fixedBuffer doesn't really return an error
n += read
bp.bufLen -= read
if b.Len() == 0 {
// it's empty so return it to the pool and move to the next one
returnBuffer(b)
bp.buf[0] = nil
bp.buf = bp.buf[1:]
}
if len(p) == read {
break
}
p = p[read:]
}
bp.wait.Broadcast()
bp.mu.Unlock()
return
}
func returnBuffer(b *fixedBuffer) {
b.Reset()
bufPoolsLock.Lock()
pool := bufPools[b.Cap()]
bufPoolsLock.Unlock()
if pool != nil {
pool.Put(b)
}
}
func getBuffer(size int) *fixedBuffer {
bufPoolsLock.Lock()
pool, ok := bufPools[size]
if !ok {
pool = &sync.Pool{New: func() interface{} { return &fixedBuffer{buf: make([]byte, 0, size)} }}
bufPools[size] = pool
}
bufPoolsLock.Unlock()
return pool.Get().(*fixedBuffer)
}

Some files were not shown because too many files have changed in this diff Show more