TO DO

Good luck understanding these notes for myself :) Some of this stuff has already been done.

JSON output plugin

# drop Div-Span-Structure
# add a hash for tagpattern
# add :url=> to matches

search for slow regexps. use examples in plugins

add variables for  : :fwversion, :model, :modules (joomla, WP, etc)

only do aggressive checks that provide more infoz.. eg, if it's already matched and all it does is identify the system then skip it

make a webpage model and a website model

show which plugin is requesting a URL

add option to not thread - need to rewrite main loop 1st

should :text be case insensitive? maybe

# make plugin errors red. check out Zoph error on 

option to not display hashes? Hashes are most useful with a large dataset to find similar systems or when manually pentesting. maybe leave it in

why does it crash with low connect timeouts? eg. 1 second
/usr/lib/ruby/1.8/timeout.rb:60:in `timeout': execution expired	from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
	from /usr/lib/ruby/1.8/net/http.rb:553:in `do_start'
	from /usr/lib/ruby/1.8/net/http.rb:542:in `start'
	from /usr/lib/ruby/1.8/net/http.rb:1035:in `request'
	from ./whatweb:521:in `open_target'
	from ./whatweb:899
	from ./whatweb:830:in `initialize'
	from ./whatweb:830:in `new'
	from ./whatweb:830


#start using a new convention so this issue can be avoided when people learn by example:
pattern=/<meta name=\"generator\" content=\"WordPress[ ]?[0-9\.]+\"/
if @body =~ pattern
	version=@body.scan(pattern)[0][1]
	m << ....


#custom plugins on the cmdline
# ./whatweb --custom-plugin ":text=>'powered by fuck'" -i ./targets
# ./whatweb --custom-plugin "{:text=>'powered by fuck'},{:regexp=>/meta generator/}" -i ./targets
#move tag pattern into main section so any plugin can use it
#move md5 into main section so any plugin can use it

Write a quick guide to writing plugins

change -example scanning. -e <plugin>,<plugin> <-- just where 2 take the targets from
--- nah

# add prefix, suffix, etc... for manual usage.
#make help wrap around a 	standard terminal size

option to not output hash plugins, footer, header, tag pattern, div span, md5. plugin groups? categories?

try to reconnect when a timeout is received

#add probability to XML output
#strip :probability requirement -- assume 100 if not specified. 
#strip :name requirement -- same as regexp, etc if not specified.
#rename probability to certainty

make local plugin take precedence over installed location

Treat targets like a stack ::  the stack is a global var
[a,r,r,a,y] pull next target from the array when the stack is empty
[s,t,a,c] pop from the stack until it's empty
[s,t,a,c,k] <-- plugins can push to the top of the stack. like links, css, js, whatever


option to supress 404s -- maybe they just just grep for 200 or -v 404

make plugins for http://trends.builtwith.com/cms



> one of the plugins attached, shoutcast administrator, doesn't work
> because it returns different HTTP 200 OK messages. is there some way to
> fix this or even use it for fingerprinting, rather than aborting the
> request?
> 
> ~/proj/whatweb-0.4.4dev$ ./whatweb  --log-brief=asdf.log -a 1
> 100.m2radio.fr:8000 <http://100.m2radio.fr:8000>
> http://100.m2radio.fr:8000 ERROR: wrong status line: "ICY 200 OK"
> 
> ~/proj/whatweb-0.4.4dev$ ./whatweb  --log-brief=asdf.log -a 1
> 128relay1.gothmetal.net:6666 <http://128relay1.gothmetal.net:6666>
> http://128relay1.gothmetal.net:6666 ERROR: wrong status line: "ICY 200 OK"
> 
> ~/proj/whatweb-0.4.4dev$ ./whatweb  --log-brief=asdf.log -a 1
> 160.79.128.30:7702 <http://160.79.128.30:7702>
> http://160.79.128.30:7702 ERROR: wrong status line: "ICY 400 Server Full"
> 
> ~/proj/whatweb-0.4.4dev$ ./whatweb  --log-brief=asdf.log -a 1
> aol-relay2.chuckeh.com:8026 <http://aol-relay2.chuckeh.com:8026>
> http://aol-relay2.chuckeh.com:8026 [302] Title[Redirect],
> RedirectLocation[http://scfire-ntc-aa03.stream.aol.com:80/stream/1051],
> MD5[2548da190bd45c8ab7b42f781b6d8b2c],
> Header-Hash[2548da190bd45c8ab7b42f781b6d8b2c]
> http://scfire-ntc-aa03.stream.aol.com:80/stream/1051 ERROR: wrong status
> line: "ICY 200 OK"


GUI
read XML and list urls by plugin

custom plugin defined on cmdline


each plugin should say what it identifies:
	* presence of system
	* version
	* modules/components/plugins
	* usernames/accountids/email addresses
	* other
	* find all pages
	
what methods it uses for each above task:
	* identify patterns in headers and HTML of any page
	* identify by pattern in guessed files
	* identify by existance of guessed files
	* identify by hash of guessed files
	
and whether the above are passive or aggressive functions.


possibly separate aggressive functions into:
def aggressive_identify
def aggressive_version
def aggressive_modules
def aggressive_users
def aggressive_other



i.e. someone should be able to turn on the aggression version test for joomla and not do the aggressive module test.

plugin identify file extensions in relative or same-site urls. eg. pl, asp, aspx, cfm, php, nsf, jsp, do.
we don't need 1 plugin per extension!

#change what verbose modes are. 
#	verbose 1.  + names of plugin rules that match
	verbose 2.  show HTTP headers, connections, all results from plugins


list of filetypes to download and scan. useful ones are: css, js, favicon.ico. not links, but stuff that a browser would include on a page.

#target prefix and target suffix

joomla plugin aggressive tests should be relative to the site root.
now they're just relative to the site.

need finer control of which plugins are agressive, etc.

easily identify local vs remote links. eg. for passive joomla components

better error logging

when loading plugins, check they have unique names

  -A,  --accept=LIST               comma-separated list of accepted extensions.
  -R,  --reject=LIST               comma-separated list of rejected extensions.


improve the md5 hash identified of versions - download tgz versions into a folder
find smallest number of files that can be used to identify a specific version:
md5hash for each file suitable
whitelist of filetypes suitable - jpg, ini, cfg, css, js
blacklist of filetypes unsuitable - inc, php, .htaccess
show files with the most differences

should escape brackets in output eg. <title>SMC[231] Console</title> currently becomes title[SMC[231] Console]

add nikto, url guessing, except for certain types of servers, dont guess urls that don't exist. i.e. apache, iis, etc - 

bundle anemone with whatweb so u dont need gems, etc and remove anemone's need to nokogiri

modules should return a static list of types of objects:
	text, version, modules, usernames, userid, 
- cms modules r just returned as a :string at the moment. could be improved on?

extract more info from https certificates, like hostnames

include IP in logs

more output types, like JSON,XML,YAML

make a distinction in colour between  standard webpage things we find all the time like
title, meta generator, md5, server-header, Mailto and other matches, i.e. Joomla.


maybe plugins should return TEXT. i.e. type of default apache,etc

let ppl specific custom functions for plugins - like enumerating all drupal nodes.




BUG https://120.136.48.33/ ERROR: EOF error end of file reached
This redirects but doesn't have a proper certificate



whatweb doesn't understand websites, only URLs
lots of good info in /robots.txt - recognise major versions of drupal
get favicon
get /robots.txt

remove aggression level 2?

include some caching of downloaded links

say whether it has aggressive tests in the plugin list


* doesn't follow redirects (sometimes)



the plugin locking is ugly, might be better to make a new instance of the plugin for each test


should modify whatweb to run as a proxy and invoke wget

fast way:
	use tiny proxy, or similar writtin in C. 
	capture the data and pass to ruby through a socket.
	ruby just starts teh proxy, starts wget and collects the data.


the url guessing sux
eg. oscommerce
http://www.pokengirl.com/cart/catalog/index.php	should guess http://www.pokengirl.com/cart/catalog/admin/login.php instead of http://www.pokengirl.com/admin/login.php
should list the file +dir strcuture so we can work out the base dir


need to log errors. in all registered logs, brief & verbose? separate log file?
add an error_out function to  class OutputBrief, etc.
modify error function to call
	output_list.each do |o|
		o.error_out(target, err )
	end



[x] dont sort targets. sorting each host in the input file is unsuitable for long files.
should just read 1 at a time, potentially from stdin

fix GHDB expressions
  --   ghdb "abc def" doesn't match "abc <b>def</b>"

detect mobile versions

logs currently only log successful attempts. should be optional

maybe Make account, username, id, etc all username in output?

md5sums of files -identify favicon for mambo, joomla, apple, etc. have @md5_hash for plugins


Use NAMED GROUPS in regular expressions for stuff like version numbers

export plugin regexp matches to XML, separate program. 
move plugin stuff into library

PHP plugin. take version from server meta string or x-powered-by, look for local links ending in .php
plugins for apache leaking win32 / unix, debian, redhat

alternative matches -- support/convert google dork/search stuff like intitle: ,

Colour Brief output according to plugin category & hilight versions

Plugin categories, i.e. javascript libraries for ( mootools, jquery, prototype)
 	server (IIS/ Apache)
 	cgi language (PHP, ASP, CFM)
    CMS (Joomla, Mambo, )   / Blogging Platform (wordpress, typepad) 
    statistics (google analytics, quantcast)

Layers :  	Server, Language, Program, Javascript
Content:	Contact

CMS is a type of Program,
Stats is a type of Javascript or Program

acclipse.rb				CMS
blogger.rb				CMS
blogsmithmedia.rb		company?
drupal.rb				CMS
echo.rb					CMS
generic-server.rb		?
generic-x-powered-by.rb ?
google-analytics2.rb	Stats
google-analytics.rb		Stats
joomla.rb				CMS
jquery.rb				javascript
lightbox.rb				javascript
mailto.rb				contact
mambo.rb				CMS
minify.rb				Program
movable_type.rb			CMS
plone.rb				CMS
prototype.rb			javascript
quantcast.rb			stats
scriptaculous.rb		javascript
typepad.rb				CMS
wordpress.rb			CMS
wordpress-spamfree.rb	?


update plugins, download from website

add plugin maturity, alpha, beta, stable, etc ?
 - be useful when users submit plugins / make plugins on website

use curl lib, curb for:
	add proxy support
 	add authentication basic/digest/form/cookie/ntlm
	change agent behaviour
	timeouts 

let people pipe html + meta data straight from stdin, eg. curl -v securityfocus.com | ./whatweb --stdin

cache pages with sqlite or in files so they only have to be fetched once

make whatweb act as a proxy. so u can spider a site with wget, use firefox, etc.



add output plugins - 
    yml, 
    brief view.  eg. JQuery v2.1.7, Joomla v1.5, probably Drupal.
'probably' refers to the top % match being >50% and <100%. shows version + other info in shorthand, space delimited

test this out on CMS showcase lists to expose errors. some movable type showcase sites are drupal

make simple regexp matches more portable so they can be exported and used in other programs. maybe have an array of regexp matches in the plugins.

firefox plugin that display the identity in the footer and sends data to my server, like wapplyzer but better



website
       let ppl make plugins in website.
       form fields = example urls, name, etc.
                     write regular expressions & test which of the
                       examples it matches



What are your thoughts on returning info for hardware? Often there's different information to return, such as hardware model, firmware version, software versions, enabled modules, etc.

The output starts to get messy.
ABC-Hardware[xx.xx.xx][Model: XYZ][Firmware: xx.xx.xx][Modules:asdf,ghkj,zxcv]





Fingerprinting using HTTP headers would be much cleaner if shodan style searches could be performed in a similair way to ghdb.

For example, instead of using:

server=@meta["server"] if @meta.keys.include?("server")
server=@meta["Server"] if @meta.keys.include?("Server")
                if server =~ /GoAhead-Webs/i
                        m << {:name=>"server string",:string=>server}
                end


You could use:

# regex style
{:name=>"server string", :shodan=>/[server|Server]+:[\s]*GoAhead-Webs/ },

# or perhaps plaintext style, case insensitive
{:name=>"server string", :shodan=>"Server: GoAhead-Webs/ },


The same applies to cookies:

m=[]
m << {:name=>"PastVisitor Cookie" } if @meta["set-cookie"] =~ /PastVisitor=[0-9]+.*/

becomes:

{:name=>"PastVisitor Cookie", :shodan=>/set-cookie:[\s]*PastVisitor=[0-9]+.*/ },


I've noted a few plugins where this could be useful:

~/proj/whatweb-0.4.4$ cat plugins/*.rb | grep -i Shodan
# About 13,446 Shodan results for Server:"AV787 Video Web Server" @ 2010-07-20
# About 1,002 Shodan results for Server:BarracudaHTTP @ 2010-07-24
# About 1,661 shodan results for printer Server:debut @ 2010-07-22
# About 21,276 Shodan results for Server:HP-ChaiSOE ONNECTION @ 2010-07-22
# About 189 Shodan results for Server:i-Catcher Console @ 2010-07-20
# About 94 Shodan results for Server:webcamXP @ 2010-07-24
# About 268 shodan results for Server: Snap Appliance, Inc. @ 2010-07-24
# About 543 Shodan results for Server:NetEVI @ 2010-07-22
# About 50 SHODAN results for Server:"Observer XT (c) Veo" @ 2010-07-18


I think ShodanHQ has gained enough publicity that using "shodan=>" would make sense.


Just a suggestion. Also I haven't been able to fingerprint using copyright symbols and different languages have been a challenge. UTF-8 support would be awesome :D




BUGS
auto colour - detect if it's being piped to a file or STDOUT, if to a file then turn off colour
nicer error handling when fed bad input, cmdline options, etc







