DEFAULT PROJECT CONCEPT
Note: Within this description, patterns that are relevant to the
description will be in [brackets] and CAPITALIZED, e.g.: [MESSAGE
CHANNEL]. When multiple possible pattern suggestions exist, you
may feel free to choose the best one for your design or substitute
another you feel may fit better.
If you're
having trouble coming up with your own project idea, you may, if you so
choose, use and implement the following project idea as your own.
Note that this is a proposal you can choose to use. It's not a
"specification" for a lab, but a "suggestion" on how you might go about
designing a solution for the project. You may (and certainly will)
change this as you go through the coding of your project, substituting
other patterns for the ones suggested here. You may find that
Hohpe & Woolf, Chapter 13, "Integration Patterns in Practice" will
be helpful in this context and will offer further examples of pattern
application.
GENERAL PROBLEM:
Imagine you are writing a statistics generator for a set of trading
engine, each of which may be running on computers in New York, London,
and Tokyo.
The trading engine, which will be a farily simple consumer [SELECTIVE
CONSUMER, POLLING CONSUMER, etc.], will
periodically look at different message channels [POINT-TO-POINT CHANNEL
or PUBLISH-SUBSCRIBE CHANNEL] for the latest
statistics on stocks in a portfolio [COMPOSITE] (you can keep it basic
with stocks or you could
choose to incorporate options, futures, etc. if you want). Your
goal in
this default course project is to load the channel (queues or
topics...you're decision...see implications below) with various
statistics for bids
and asks (min, max, average, standard deviation, variance, regression,
moving average, etc.). Your program would read in tic data, run
the statistics [STRATEGY, TEMPLATE METHOD], load the stat channels, and
your trading engines which manage a portfolio of stocks would update
the stocks in their portfolios with the appropriate stats. You
should have a couple of different trading engines each of which
is interested in different stocks and different statistics from the
channels. Each Trading Engine should have a single instance of a
"Reporting Engine" [SINGLETON] that will periodically printout stat
data for its portfolio [COMPOSITE, ITERATOR].
DESIGN IMPLICATIONS:
You can design this
so that you have one channel per stock, with multiple statistics inside
each stock's channel. Or, you could choose to have multiple
statistics channels, one for min, one for max, one for average, etc.,
and put the various stock values on the channel, so the "STD_DEV"
channel would hold messages [MESSAGE] that contain stdev values for
MSFT, IBM, etc. You might also decide to design this where each
traded stock has its own dedicated set of statistics (DATATYPE
CHANNEL).
You would then have a few very simple"trading
engines" that would subscribe to the various channels and
consume the statistics off the channels (one trading engine might only
be interested in MSFT and ORCL data, another trading engine might be
interested
in only average and standard deviation for IBM, ORCL, and MSFT,
etc.). (Certainly, you can challenge yourself further and make
this more interesting if you include concepts for depth of book and
"Reversion to Mean"
strategies, etc., if these are meaningful and you are so
inclined...such enhancements would be totally up to you).
You will need a "data feed" that delivers tic information periodically. You can use this file that includes 333 discrete tic information files for MSFT, IBM, and ORCL
(or you can
create your own data feed if you're interested in expanding to options
and futures). You could define a Camel Endpoint [ENDPOINT] that
consumes those data files and puts the tic information on a channel.
You may need to use a translation [MESSAGE TRANSLATOR] to get the information into a format that you prefer.
You might
decide to make your trading engines a little more "realistic" by having
them actually "trade" (buy, sell, or hold) given the incoming
statistical data (leveraging a moving average or something even more
simple). If you were to choose to implement this, your engine will definitely have
to "remember" what it's position is in a give stock, whether it's long or short a given stock, etc. [STATE].
You should be concerned about "invalid" input data from the data
feed. You might want to incorporate an error channel [INVALID
MESSAGE CHANNEL] should invalid data input be discovered on publication.
SUMMARY OF POSSIBLE DELIVERABLES:
You could arrange this in a variety of ways. One way would be to
have a single "Data Producer" that reads in tic data and publishes it
to a data channel. Then, you might have a single or dedicated
statistics engine that reads the tic data and publishes statistics out
to different channels (again organized either by stocks, statistics, or
some combination of the two). Then, you might have multiple
trading engines that are dedicated to a particular type of pricing
strategy, and consume the statistics they are interested in from the
various channels, and print out their stats or if you choose to enhance
the trading engines a little the positions of the trading engine.
Questions to ask yourself as you choose among the pattern options:
1. Queues or Topics? I.e., Point-to-Point or Publish-Subscribe? Key question here is: will you have multiple pricing engines needing to access the same statistics?
2. How do you want to design the channels? Do you want to
have stock channels that hold multiple statistics for their stocks
(only)? Do you want to have different statistics channels ("AVG",
"STDEV", etc.) that hold specific statistics information for multiple
stocks? Do you want to have specific channels for a given stock
and statistic combination ("MSFT_STDEV")? One implication to your
decisioning is to decide which design would be most flexible when
another trading engine and set of stocks is added.
3. As for running the statistics, do any of them share parts of a
common algorithm? For example, if you run certain statistics such
as variance, standard deviation, regression, covariance, etc., might
those strategies leverage some of the same steps? And if so, is
there an opportunity for a TEMPLATE METHOD?
4. Do you need to enhance [CONTENT ENRICHER/Camel .enrich()] the
tic data as it goes onto the data feed channel with additional
information such as a timestamp, etc.?
Overall Suggestions:
Remember that the following few very simple lines of Camel Code actually imbed several different enterprise integration patterns:
ConnectionFactory connectionFactory = new
ActiveMQConnectionFactory("tcp://localhost:61616");
context.addComponent("jms",
JmsComponent.jmsComponentAutoAcknowledge(connectionFactory));
. . .
from("ftp://localhost/orders?username=secret&password=secret)
.process(new Processor()
{
public void process(Exchange exchange) throws Exception {
System.out.println("We just downloaded: " +
exchange.getIn().getHeader("CamelFileName"));
}
})
.to("jms:incomingOrders");
already leverages 5 Enterprise
Integration Patterns, including: Message Broker (ActiveMQ),
Endpoint (from/to), Message Translator (Processor), Message Channel
(jms:incomingOrders), and Message (which encapsulates the data being
passed between endpoints from ftp file to JMS queue).
Project Expansions and Extrapolations:
You could modify this idea to challenge yourself further by extending this idea to generate Monte Carlo Simulations, etc.
You could modify this idea to challenge yourself further by enhancing
the core concept of stocks with options derivatives, in which your
stats leverage Black-Scholes and become the Greeks: beta, delta,gamma, theta, vega, kappa, rho, etc.
You
might decide to transform [MESSAGE TRANSLATOR] your messages into
a common model on the way to a channel so that the impact of different
desired formats among different trading engines is minimized [CANONICAL
DATA MODEL].
You might decide to enhance your statistics with qualitative
information (in addition to quantitative data for statistics) and keep
track of public "mentions" of a particular stock in the news.
Certainly, natural language processing is beyond our simple
example. But you could think of a stub [PROXY] that runs several
RSS financial reads, and counts the times "MSFT", "ORCL", etc. appears
in the news, and publish those counts to a channel (or transformed as
categorical codes). Your trading engines could then keep that
information in mind as well as they operate. Would you want a
proxy for each stock you're trading, or you might trade? You
would have to be careful here, because you can't keep track of all
stocks, as big data tends to be BIG. The idea of course is that
at some point "in the future" you would replace the stubs with actual
strategies that do qualitative statistics on big data inputs.