Skip to content

lgatto/proteowizard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

ProteoWizard

This is copy of the ProteoWizard code used to learn and explore the project. See the official ProteoWizard page for details. The orignal code is available at http://proteowizard.sourceforge.net.

License

Apache License Version 2.0 - see LICENSE for details.

Installation and compilation

git clone git@github.com:lgatto/proteowizard.git
cd proteowizard
./quickbuild.sh
$cat BUILDING 
BUILDING PROTEOWIZARD

Run "quickbuild.sh" to build ProteoWizard. (Use "quickbuild.bat" for Windows.) 

Use the "--help" switch for some useful hints on building ProteoWizard.

Open doc/index.html in a browser for more detailed info on
ProteoWizard, known issues, etc.

Technical documentation

See [./pwiz/doc/technical/index.html]

Directory Layout

[root]
    build   [everything is built here]
    doc     [documentation]
    example_data [some example data files]
    libraries    [3rd party library archives]
    pwiz       [main source tree -- all Apache licensed]
    pwiz_aux   [non-Apache licensed contributed code]
    pwiz_tools [source code for tools]

Projects

Here is an outline of the various ProteoWizard projects, organized by dependency level. There may be dependencies within a given level, but there should never be any up-level dependencies. Unless otherwise noted, all projects are cross-platform. Each project's source files are contained in the subdirectory of pwiz of the same name.

level 0 (pwiz/utility)

  • math: Mathematics classes (linear algebra, statistics, special mathematical functions)
  • misc: Miscellaneous standalone utility classes (Base64, SHA-1, 2D drawing, unit testing)
  • minimxml: XML parsing and writing
  • proteome: Chemical formula, peptide, and isotope calculations.
  • vendor_api: Vendor-specific API wrappers (Windows only)

level 1 (pwiz/data):

  • msdata: Mass spec file format abstraction layer.
  • ident: identification file format abstraction layer.
  • misc: Library containing classes for handling FT transient data, complex frequency data, MS1 peak data.
  • vendor_readers: Vendor-specific Reader implementations

level 2 (pwiz/analysis):

  • chromatogram_processing: Chromatogram analysis
  • frequency: Library of routines for frequency-domain peak detection.
  • passive: Event-driven analysis modules
  • peakdetect: General interface for peak detection
  • peptideid: Modules handling peptide id info abstraction and parsing
  • spectrum_processing: Spectrum analysis

level 3 (pwiz_tools):

  • commandline: Command-line tools
  • SeeMS: Graphic data visualization program (Windows only)

Code Conventions

A code module consists of an interface (Foo.hpp), implementation (Foo.cpp), and a unit test (FooTest.cpp). The interface should be self-documenting, with optional inclusion of comment markup for automated documentation tools (e.g. Doxygen). The unit test serves two purposes:

  • To exercise the module's interface and validate its behavior independent of other modules.
  • To document the intended usage of the code module.

Clients of a code module should never need to look at the implementation for questions about usage.

ProteoWizard Data Access Layer Design

The ProteoWizard data access layer library is pwiz/msdata, and the interface and data structure definitions are in MSData.hpp.

The data model is a one-to-one translation from mzML data elements to C++ structs. The root mzML element correspondes to an MSData struct, and the sub-elements correspond to structs with similar names. SpectrumList has a virtual interface, which allows for lazy evaluation backed by a data file.

The mzML controlled vocabulary (CV) is parsed at compile time, generating cv.hpp and cv.cpp. This allows CV terms to be used in a typesafe manner, and also makes the various CV relations and synonyms available to C++ client code.

Mapping from the various structs to mzML is done in the IO module, and diff calculations in Diff. Serializer_mzML and Serializer_mzXML allow serializations to/from iostreams in mzML and mzXML formats, respectively.

MSDataFile is a subclass of MSData that adds file I/O handling. MSDataFile::Reader provides a generic interface for file readers. By default, Readers for mzML, mzXML, and Thermo RAW files are provided. On instantiation with a filename, MSDataFile finds a Reader that will accept the file, and uses that Reader to fill in the internal data structures.

Sandbox

./pwiz/sandox with code initially from ./pwiz/doc/technical/hello_pwiz.

Notes

identdata

  • IdentData: mzIdentML structure.

  • IdentDataFile: IdentData object plus file I/O.

  • Reader: interface for file readers.

  • DefaultReaderList: reader classes for mzid, pepXML, protXML.

  • MascotReader: Mascot file reader.

  • Serializer_*: write mzid, pepXML, protXML and Text (tab delimited) files.

  • examples: generate example files.

  • IO: XML read/write.

  • References: functions for resolving references from objects into the internal MSData lists

  • Pep2MzIdent: Translates data from a MinimumPepXML object into a IdentData object tree when a translation is known.

  • Other: DelimReader, DelimWriter, Diff, KwCVMap, MzidPredicates, TextWriter, Version.

About

The ProteoWizard Library and Tools for - http://proteowizard.sourceforge.net/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published